[Dirvish] A suggestion for dirvish-expire

Hernán J. González hgonzalez at gmail.com
Tue Dec 4 19:44:01 UTC 2012

I've using Dirvish for a long time, and I really like it. Thanks.

A suggestion for dirvish-expire, which current behaviour I consider
rigid and very dangerous.

Currently, the way of specifying which backups should be trimmed
(expired) is via the crontab-like "expire-rule" entries.
That seems quite dangerous and difficult to maintain. A crontab spec
is good for events in the future, not in the past.

In most  typical scenarios, what the user wants is to keep all or
nearly all the recent backups, and a few of the older ones. That is,
we want to have high temporal granularity for recent backups and low
granularity for older ones, just like many apps do (MRTG).

Currently, that's difficult to specify, and much more difficult to change.
Suppose I run a full dirvish on Monday-Wednesday-Friday
I could specify my expire rules like these:

     *     *     *     *           *    +10 days
     *     *     *     *           5    +2 months
     *     *     1-7   *           5    +9 months
     *     *     1-7   1,5,9       5    +4 years

This means:
- keep the file for 4 years if first friday of months Jan/May/Sep (3
backups per year)
- keep the file for 9 months if first friday of any month (1 backup per month)
- keep the file for 2 months if friday  (1 backup per week)
- keep all files younger than 10 days

This works, and might seem ingenious. Actually, it's terribly fragile,
it's a nightmare to maintain - or a bomb waiting to explode
Suppose someday you want to change your backup schedule, and you start
running dirvish on Sunday-Tuesday-Thrusday. You're doomed. You can't do it.
And if you do it and naively change the '5' (week day) in
your expire-rules to '4' ... ooops... you've lost all your old backups.
(And if you don't change them, the new backups will expire after 10 days)

I think this is unacceptable.

I propose then to change dirvish-expire so that it bases on more
natural rules, eg

 +4 y:   4 m
 +9 m:   1 m
 +2 m:   1 w

which means: for backups older than 4 years, keep at most 1 backup
each 4 months;
for backups older than 4 months, keep at most 1 backup each 1month; etc
The expire algorithm would start from the most recent backup and trim
the next backups
that are separated from the current one in less than the prescribed interval.
This is conceptually clear and has no dangers
(I've done some similar perl scripts for trimming some old-style
-tarballs- backups)

Another useful (orthogonal) option to dirvish-expire would be to
look for some special file inside the vault that works as a "dont ever
expire me" flag.

Hernán J. González

More information about the Dirvish mailing list