[Dirvish] dirvish-expire loses on 'summary' directory

Barton C Massey bart at cs.pdx.edu
Sun Jan 28 14:51:37 PST 2007

In message <20070128131059.GA2111 at msgid.wurtel.net> you wrote:
> On Sat 27 Jan 2007, Barton C Massey wrote:
> > The BUGS section of the dirvish-expire manpage says:
> > "Dirvish-expire will walk the file hierarchy of all
> > banks or the specified vault looking for summary
> > files. Anything non-dirvish in there may cause excess
> > file-walking." And it's right. I just got bit hard
> > because the lost+found on my disk happened to contain
> > "#12371245/career/summary" and it was a directory.
> Any lost+found directory should be empty except for a
> brief period after a crash where the fsck necessarily had
> to move things there. Immediately after the fsck (and
> certainly before using the filesystem in question for
> production!) the contents of the lost+found directory
> should be inspected and moved to appropriate
> places. That's nothing to do with dirvish, but with good
> system administration...

You should move the comment out of the BUGS section if it's
not a bug.  However, if this dirvish behavior is truly the
intent, I think it is a defect in the dirvish specification.

First, dirvish should try to behave safely even the presence
of "bad system administration".  Second, your idea of "good
system administration" practice might be different than mine
or some other user's.  Third, and perhaps most importantly,
finding a defective vault during expiration shouldn't cause
backups on perfectly good vaults to be silently failed,
which appears to be what was happening.

> A bank should contain just dirvish backups, and not also
> be a general purpose storage location.  If you don't want
> to dedicate a whole filesystem to dirvish, at least create
> a subdirectory on the fileystem and use that as the
> bank. I do that anyway on dedicated filesystems...

If keeping all other storage out of banks is a dirvish
requirement, it would be nice if the setup documentation for
dirvish be made explicit on this point.  I have a dedicated
disk that I am using for backups; having the lost+found
directory "in the bank" wasn't an obvious problem to me,
since there was clearly a config file that knew what things
in the bank were vaults.

> > A real fix, please? It doesn't look hard to do the right
> > thing, i.e. only work with the vaults as defined in the
> > config file, but I'm not a great Perl programmer and haven't
> Actually, I find it very useful that dirvish-expire
> traverses vaults inside a bank whether they're listed in
> the config file or not (I suppose you mean the master.conf
> config file?). That way, if a system is removed from the
> dirvish config (because it doesn't exist anymore or
> whatever) all the images are slowly expired as per usual
> until one last one remains; then I get a cron email about
> that and I can decide to remove it altogether or to
> archive it.

On the contrary, I think this is a truly dangerous behavior:
it totally violates the principle of least surprise, and
could lead to important data loss.  If I take a vault out of
master.conf, it's reasonable to expect it to be left
completely alone by dirvish.  If, for example, I take a
system out of service due to catastrophic failure and remove
its vault from master.conf, I might reasonably expect that
the year's worth of backups in its vault remain intact for
future reference, not get slowly expired.

If you want to have a vault expired but not backed up, it
seems to me straightforward to add a master.conf
"expire-only" option to that effect.

Having dirvish-expire wander around in the bank looking for
things named "summary" and assuming that they're vault
control files that it should expire seems error-prone and
counterintuitive to me.  If you leave files lying around in
the bank that dirvish didn't put there, you may lose.  If
you name a vault "tree" or "summary", you will lose.  The only
way to keep a vault from being expired is to move it out of
the bank altogether.
If the consensus among dirvish developers is that this is
desirable behavior, I guess I'll work around it, at least
for now.  Is this behavior really what folks want?

    Bart Massey
    bart at cs.pdx.edu

More information about the Dirvish mailing list