[Dirvish] dirvish-expire loses on 'summary' directory

Dave Howorth dhoworth at mrc-lmb.cam.ac.uk
Mon Jan 29 11:26:56 UTC 2007


Barton C Massey wrote:
> In message <20070128131059.GA2111 at msgid.wurtel.net> you wrote:
>> On Sat 27 Jan 2007, Barton C Massey wrote:
>>> The BUGS section of the dirvish-expire manpage says:
>>> "Dirvish-expire will walk the file hierarchy of all
>>> banks or the specified vault looking for summary
>>> files. Anything non-dirvish in there may cause excess
>>> file-walking." And it's right. I just got bit hard
>>> because the lost+found on my disk happened to contain
>>> "#12371245/career/summary" and it was a directory.
>> Any lost+found directory should be empty except for a
>> brief period after a crash where the fsck necessarily had
>> to move things there. Immediately after the fsck (and
>> certainly before using the filesystem in question for
>> production!) the contents of the lost+found directory
>> should be inspected and moved to appropriate
>> places. That's nothing to do with dirvish, but with good
>> system administration...
> 
> You should move the comment out of the BUGS section if it's
> not a bug.

The bug is the "excess file-walking", I believe.

>  However, if this dirvish behavior is truly the
> intent, I think it is a defect in the dirvish specification.
> 
> First, dirvish should try to behave safely even the presence
> of "bad system administration".  Second, your idea of "good
> system administration" practice might be different than mine
> or some other user's.  Third, and perhaps most importantly,
> finding a defective vault during expiration shouldn't cause
> backups on perfectly good vaults to be silently failed,
> which appears to be what was happening.

I didn't understand this from your earlier message. You didn't explain
what was happening, except that dirvish-expire was descending into a
lost+found directory. What effect did dirvish-expire cause and how did
that affect dirvish itself?

>> A bank should contain just dirvish backups, and not also
>> be a general purpose storage location.  If you don't want
>> to dedicate a whole filesystem to dirvish, at least create
>> a subdirectory on the fileystem and use that as the
>> bank. I do that anyway on dedicated filesystems...
> 
> If keeping all other storage out of banks is a dirvish
> requirement, it would be nice if the setup documentation for
> dirvish be made explicit on this point.  I have a dedicated
> disk that I am using for backups; having the lost+found
> directory "in the bank" wasn't an obvious problem to me,
> since there was clearly a config file that knew what things
> in the bank were vaults.

I agree that the documentation could be clarified. It seems to me that
the problems you and Robert Sander have experienced could be avoided if
the documentation said:
 (1) Never use a filesystem for a bank or vault. Make a directory within
a filesystem, and use directories within those directories for vaults.
 (2) Don't call a bank or vault 'tree' or 'summary' (Suggestion, use
host names or filesystem names etc for the names of banks and vaults)
 (3) Never put anything other than dirvish vaults inside a dirvish bank.

I don't believe this would significantly restrict the usage of dirvish.

>>> A real fix, please? It doesn't look hard to do the right
>>> thing, i.e. only work with the vaults as defined in the
>>> config file, but I'm not a great Perl programmer and haven't
>> Actually, I find it very useful that dirvish-expire
>> traverses vaults inside a bank whether they're listed in
>> the config file or not (I suppose you mean the master.conf
>> config file?). That way, if a system is removed from the
>> dirvish config (because it doesn't exist anymore or
>> whatever) all the images are slowly expired as per usual
>> until one last one remains; then I get a cron email about
>> that and I can decide to remove it altogether or to
>> archive it.
> 
> On the contrary, I think this is a truly dangerous behavior:
> it totally violates the principle of least surprise, and
> could lead to important data loss.  If I take a vault out of
> master.conf, it's reasonable to expect it to be left
> completely alone by dirvish.

I think you're exaggarating. I don't think there is a 'reasonable'
expectation here. I wouldn't have been surprised to hear that dirvish
behaved in either way. Again, it ought to be documented if it isn't already.

>  If, for example, I take a
> system out of service due to catastrophic failure and remove
> its vault from master.conf, I might reasonably expect that
> the year's worth of backups in its vault remain intact for
> future reference, not get slowly expired.

Now this does surprise me. Why would you suddenly want to keep all these
old backups that you'd already scheduled to expire? dirvish will
automatically keep the last one, so why have the older ones suddenly
acquired a value for you that they didn't have whilst the system was
working? In any case, presumably the first thing you'd be doing on that
day would be repairing the failed system, so the issue would be moot.

> If you want to have a vault expired but not backed up, it
> seems to me straightforward to add a master.conf
> "expire-only" option to that effect.
> 
> Having dirvish-expire wander around in the bank looking for
> things named "summary" and assuming that they're vault
> control files that it should expire seems error-prone and
> counterintuitive to me.  If you leave files lying around in
> the bank that dirvish didn't put there, you may lose.  If
> you name a vault "tree" or "summary", you will lose.  The only
> way to keep a vault from being expired is to move it out of
> the bank altogether.

I think these are simply matters to be emphasised more in the documentation.

> If the consensus among dirvish developers is that this is
> desirable behavior, I guess I'll work around it, at least
> for now.  Is this behavior really what folks want?

I think you're more likely to get a positive response from other Dirvish
users if you propose specific changes and provide patches and tests.
What I want, for example, is that nothing should break my existing
backup regime.

Cheers, Dave



More information about the Dirvish mailing list