[Dirvish] examining disk space usage per image...need script or rsync parameter

Paul Slootman paul at debian.org
Fri Dec 30 11:00:14 UTC 2005

On Thu 29 Dec 2005, Richard Geoffrion wrote:

> Now... I originally composed the message  below..and I'm still curious 
> about why DU works that way..but in finishing the last message I had the 

du reports the total usage of the given directory. Thus it keeps track
of hard-linked files, so as to only count them once.

If you have 2 dirs which contain mostly files that are hardlinked
between the 2 dirs, then running du on each individual dir and adding
the 2 resulting numbers together will give you a number almost 2 time
too large. Running du on the parent dir that contains both the 2 dirs
will give the real usage.  It's quite logical actually.

> inspiration to look at the dirvish logs.  I see now that $image/log.gz 
> contains the files that were RSYNC'ed for each image.  Does anyone have 
> a script that reports file sizes of the files rsync'ed in a given image 
> or is there a way to have dirvish add that to the log?

I've whipped up a quick script (with some help from find2perl) to report
the disk usage of files in and below the current directory that have
only one link, i.e. those files that are unique to the current directory
(or image, as you will).  Extending the script to allow for command-line
specification of the directory to inspect is left as an exercise for the
reader :-)

Paul Slootman

dirvish-image-usage.pl :

#! /usr/bin/perl -w

use strict;
use File::Find ();

sub wanted;

my $totalblocks = 0;

# Traverse desired filesystems
File::Find::find({wanted => \&wanted}, '.');

$totalblocks *= 4;      # blocks are 4k units, du reports in kB
print "Total blocks: $totalblocks\n";


sub wanted {
    my ($dev,$ino,$mode,$nlink,$uid,$gid);

    if ((($dev,$ino,$mode,$nlink,$uid,$gid) = lstat($_)) && (-f _) && ($nlink == 1)) {
        $totalblocks += int(((-s _) + 4095) / 4096); # round up

More information about the Dirvish mailing list