[Dirvish] Utility to show incremental usage per image within a vault

Eric V. Smith eric at trueblade.com
Fri May 5 21:45:03 UTC 2006

Matthew Pressly wrote:
> I've looked for something to do this, but have not been able to find
> anything.  Is there a utility that will run on linux that
> can be used to determine the incremental disk usage of all images
> within a vault, starting with the most recent image and going back
> in time, counting each file's size only once, regardless of the
> number of hard links to it?
> I.e., something that would produce output similar to:
>   latest 1.2GB
>   20060429 23MB
>   20060428 25MB
>   20060427 26MB
>   ...
> Basically, what I'm looking for is a way to determine how much
> each vault ties up, going backward in time from the most recent.
> 'du --max-depth=1' run at the vault level comes close, since by 
> default, it adds file size into the total only once, but the sort
> order is backward, since the most recent vault is alphabetically
> last rather than first.  On the other hand, using 
> 'du --max-depth=0 latest 20060430 20060429 ...' appears to treat
> each starting directory independently, so files linked to be 
> multple hard links are counted in each directories total, so the
> total would look more like:
>   latest 1.2GB
>   20060429 1.2GB
>   20060428 1.1GB
>   20060427 1.1GB
>   ...
> which doesn't show the true incremental cost of each image.
> Is there some way to do this with 'du', or is there another tool
> that will do this?

All of my image directories are named 'img*'.  I use this script:

for dir in /path-to-vault/img* ; do sum=0; for s in `find $dir -type f 
-links 1 -print0 | xargs -0 stat -c '%s'`; do sum=`expr $sum + $s`; 
done; echo $dir $sum; done

That's all one line, watch for line breaks.

stat's "%s" is size in bytes, for lots of files you might want to use 
disk usage size.  I seem to recall stat only reports that in blocks, so 
it's somewhat less convenient.  You'll have to sort this in reverse 
order (by piping through "sort -r") to meet your requirements.  But to 
sort it, it needs to produce the entire output before you see anything, 
so I prefer this version.

Hope that helps.


More information about the Dirvish mailing list