[Dirvish] checksum=0 may be causing inaccurate backups!!!

Barton C Massey bart at cs.pdx.edu
Sat Jun 23 00:30:44 UTC 2007

In message <467B8A62.5060208 at mrc-lmb.cam.ac.uk> Dave wrote:
> Barton C Massey wrote:
> > It seems like it can be heavily optimized, is all I was
> > thinking.  If a new file is created with the same metadata
> > as some old file, you just need to checksum the old and new
> > files to see if they're the same?  In other words, the ctime
> > should save you here?
> I don't think it's to do with new files. If there's an existing file
> that appears from the metatda not to have changed, then it has to
> compute the checksums and compare them. That is, it has to compute
> checksums for almost every file in the archive, twice!

I guess I'm not sure what metadata checks are failing here.
In particular, if a file has the same ctime it had before
(and the other metadata hasn't changed), I think it's OK to
not back it up again even if the contents are now different?
As I understand it, that was the purpose of the inode ctime
field in the first place...  If the ctime of the inode at a
given path has changed, then a checksum would determine
whether we're still looking at the same file but just
harmlessly "touched" or not.  That's surely not the common
case, though, and thus not so expensive.


More information about the Dirvish mailing list