[Dirvish] Re: Re: Utility to show incremental usage per image within a vault

Matthew Pressly mpressly at claborn.net
Mon May 15 17:41:08 UTC 2006

On Thu, May 11, 2006 at 09:24:48AM +0200, Elmar Natter / IN MEDIA KG wrote:
> > Matthew Pressly wrote:
> > 
> >> Thank you all for the input. I also found this posting:
> >> 
> >> http://lists.samba.org/archive/rsync/2004-June/009882.html
> >> 
> >> Based on that script and some additional coding, I have
> >> something that is pretty close to doing what I need.
> > 
> > 
> > So....are you going to share??
> > 
> > :)
> So is he going to share this tool, I'm interested also.
> _______________________________________________
> Dirvish mailing list
> Dirvish at dirvish.org
> http://www.dirvish.org/mailman/listinfo/dirvish

The script is below.  I think something quite a bit simpler 
could be written using du in the way that Shawn posted:

> du -m -s -c */* --exclude=dirvish

All that would be needed would be to construct the command
line so that the image snapshots were shown in reverse
chronological order.

For some reason du (at least the version I am using) correctly
accounts for overlap of files in snapshots when -c is specified,
but does not when -c is not specified.

I also realized, after writing the script, that inode numbers
are included in the $vault/$image/index file, so a script could
be written to process those rather than stat every file in 
the backup tree.

The output from the script is meant to be easily parsed
because I run it from another script that iterates over
all the vaults and pretty-prints the results.

Note that this script doesn't give the information you
would need to know how much disk space you could reclaim
by deleting a vault that is not the oldest.  One way to
calculate that would be to do a 'du' like the one above
with the snapshot directory that you are interested in
deleting listed last on the command line, so that you can
find out the space used in it that is not shared with any
other image.

Here's the script.

#!/usr/bin/perl -w

use strict;
use File::Find;

my %seen; # Keep track of previously seen inodes
my $vault = shift @ARGV; # Expect a single vault name as input

my ($total_blocks, $total_size) = (0, 0);
my ($total_blocks_all, $total_size_all) = (0, 0);
my ($subtotal_blocks, $subtotal_size); # Subtotals counting hard linked files only once
my ($subtotal_blocks_all, $subtotal_size_all); # Counting all occur. of hard linked files

my $BANK = "/data/dirvish/backup-vaults"; # Change to suit location of your vaults
my $vault_path = "$BANK/$vault";

opendir(my $dh, $vault_path) or die "Cannot opendir $vault_path: $!";
my @images = grep { !/^\.\.?$/ } readdir($dh);

for my $image(sort recent_first @images) {
  next if $image eq 'dirvish'; # Skip the config dir

  my $image_path = "$vault_path/$image";
  ($subtotal_blocks, $subtotal_size) = (0, 0);
  ($subtotal_blocks_all, $subtotal_size_all) = (0, 0);

  find(\&wanted, $image_path);
  print_stats($subtotal_blocks, $subtotal_size, $subtotal_blocks_all, $subtotal_size_all, $image);

  $total_blocks     += $subtotal_blocks;
  $total_size       += $subtotal_size;
  $total_blocks_all += $subtotal_blocks_all;
  $total_size_all   += $subtotal_size_all;
print_stats($total_blocks, $total_size, $total_blocks_all, $total_size_all, "TOTAL");

sub print_stats {
  my ($blocks, $size, $blocks_all, $size_all, $label) = @_;
  # Assume blocks are 512 bytes, and report size in MB
  # Print disk usage, based on number of blocks / apparent file size
  printf "%.1fM,%.1fM,%.1fM,%.1fM,$label\n",
    int($blocks/2 + 0.5) / 1024, $size / 1024 / 1024,
    int($blocks_all/2 + 0.5) / 1024, $size_all / 1024 / 1024,

# Called by File::Find::find
sub wanted {
  my($inode, $size, $blocks) = ( lstat($_) )[1, 7, 12];

  # These always count toward the total.
  # This counts a little differently than du for each image because
  # there could be shared files within an image in addition to shared
  # files across images.
  $subtotal_blocks_all += $blocks;
  $subtotal_size_all += $size;

  return if $seen{$inode}++; # Already-visited files do not contribute to totals

  $subtotal_blocks += $blocks;
  $subtotal_size += $size;

# Assumes that image-temp is used, so that the most recent image is named 'latest'.
sub recent_first {
  return -1 if ($a eq "latest");
  return +1 if ($b eq "latest");
  return $b cmp $a;


More information about the Dirvish mailing list