[Dirvish] reiserfs or not?

foner-dirvish at media.mit.edu foner-dirvish at media.mit.edu
Sat Jan 14 23:12:12 UTC 2006

    Date: Sat, 14 Jan 2006 20:33:02 +0000
    From: Arjen Meek <arjen at xyx.nl>

    Can anyone give me any useful input as to whether ReiserFS is better for
    a Dirvish backup filesystem than, say, ext2/3 with lots of inodes and
    small blocks? The system in question was built for Dirvish so the
    filesystem will be used only for backups; in this case of a set of
    remote filesystems amounting to ~100 GB.

My advice is to stay as far away from either version (3 or 4) of
ReiserFS as possible.  For more information, see [1] and, for even
more technical details, [2] (which is referenced after the first few
paragraphs of [1], just before all the forwarded traffic).  Plenty of
people value it for its speed, but I find its speed advantages to be
neglible for a -real- workload that isn't a Usenet news server, and
its fragility (especially in the face of kernel panics or machine
hardware resets) to be far more important---in my opinion, the -first-
priority of a filesystem is to -store files-, and the -second- is
performance.  Hans Reiser believes differently.

I considered, e.g., JFS and XFS, but I wanted the ability to
dynamically resize the filesystem in -both- directions (larger
or smaller) depending on what I did to the underlying LVM2 partitions,
so I stuck with ext3fs, after spending a long time running tests and
convincing myself that the predominant fear everyone voiced about it
for this application (running out of inodes) was unlikely to be a
problem for me.  I configured my bank partition like so:

  mke2fs -v -j -m 1 -T largefile -i 40960 /dev/the-partition

(Note in particular the ratio of inodes to blocks---neither as large
as it could be, nor as small, derived from a couple trials with my mix
of file sizes, which is probably a pretty typical mix.  Note also that
I only reserved 1% of the filesystem for root; since dirvish runs run
as root -anyway-, there seemed little point in building in lots of
reserved blocks, and having that partition fill up won't fill up root

With these parameters and my workload, this means that I back up about
300GB of files (really, about 340GB, but 40GB of those are squeezed
out via faster-dupemerge) onto a 300GB disk; this is about 2 million
files.  The bank partition is about 100% full at the moment, and 50%
of its inodes are in use.  Since my rate of new-file creation is
relatively low, and since the files that are hardlinked between each
snapshot don't take up any extra inodes (which is, essentially, what
a hard link --means-), I'm fairly confident that I won't run out of
inodes until I'm out of disk space.

This is based on having on the order of 50 extant snapshots at any
given time, e.g., a month of dailies, and a few weeklies/monthlies/
yearlies.  If you plan to have hundreds or thousands of snapshots,
the math may work out a bit differently.

The one pitfall:  If you happen to be unfortunate enough to have one
file with a -lot- of hardlinks to it (e.g., thousands), every time you
create a new snapshot, you'll add that many thousands of hardlinks to
that file (of course).  In my case, I had a single file with about
4000 hardlinks to it, because I had a directory with about 1000 of the
same file in it (courtesy of some installation that dropped 1000
identical files that all said, "this documentation is superceded;
please see this other place instead"), which then all got merged into
one file with 1000 hardlinks to it from a faster-dupermerge run, which
then wound up merging that directory with 4 identical copies elsewhere
on my filesystem.  (These were backups of 4 machines which all had this
directory in 'em.)  This saved lots of space, but meant that after
about 7 snapshots were created, the 8th blew out trying to create even
more hardlinks to that file, because ext3fs has a limit of 32000 (not
2^15!) hardlinks to any given file.  Some other filesystems don't have
this limit.  In my case, I solved this instead by tarring up that
single directory (which I never used anyway) everywhere it occurred,
and compressing it (which compressed extremely well, because it was
1000 copies of the same data), and then backing up -that-.  So instead
of 1000 hardlinks to that file (and thus 4000 across the 4 copies I
had) per snapshot, I had 4 hardlinks per snapshot---and I'm not going
to have 8000 snapshots on disk at any given time... :)

[1] http://foner.www.media.mit.edu/people/foner/Sys/reiserfs-considered-harmful.txt
[2] http://zork.net/~nick/mail/why-reiserfs-is-teh-sukc

More information about the Dirvish mailing list