[Dirvish] Tips for low-bandwidth backups

Keith Lofstrom keithl at kl-ic.com
Mon Aug 6 14:32:28 UTC 2007


Asheesh Laroia wrote:
> The initial image is done, it's just that the other site generated a lot 
> of data that should get backed up.  Most days there will be a tiny 
> differential, but some days there will be a huge differential - so big 
> that it could take around a week to push it all up.

Dirvish and Rsync are perhaps the best way to get the most backup data
over the slowest link, but there are still some jobs that are too big.
Sometimes you have to reduce your expectations, or increase capability.

That said, there are some things you can do to reduce the amount of
data that flows:

1)  Analyze the files that move.  What is causing most of the movement.
What can you change?  

2)  Use excludes.  Yes, it would be nice to back up all those big
source downloads on the client, but you can always download them
again.  That might be faster than restoring over a slow link.

3)  Avoid renaming big files.  For example, use the dateext option for
logrotate, so the big log files are not renamed every time logrotate
runs, but keep a stable date extension after the first renaming.   
Every time numeric extensions on logs move maillog.1 to maillog.2,
maillog.2 to maillog.3, etc, dirvish/rsync does a heck of a lot of
work.

4)  Avoid using dirvish on large, incrementally changing files.
Change your mail storage format to MAILDIR;  now only the new mail
messages are moved as individual files, rather than big globs of
data like MAILBOX format.  Dirvish only moves the deltas, but it
does a lot of communication that can be avoided.  Databases often
have their own backup schemes; those are better for speed and for
consistency.  You will find other big slowly-moving files when you
analyze.

5)  Consider a local backup server at the remote site.  A second disk
on the client machine is not as robust as an offsite disk, but you
can back up the client to the secondary disk once a day, and copy
the offsite disk once a week.

6)  We do backups so we can do restores.  If your backups are very
slow, you will not be able to restore quickly over the same path.
So a solution that makes restore practical will probably help you
with backup, too.

7)  Buy more bandwidth.  If your client is very remote, there are
some satellite options that have decent upload speeds for $$$ (or
the local equivalent).  Or buy N channels.   (A local server is
starting to look pretty good ...).

Learn about what is actually happening, and think through what you
want to make happen, and you will find your solution.

Keith

-- 
Keith Lofstrom          keithl at keithl.com         Voice (503)-520-1993
KLIC --- Keith Lofstrom Integrated Circuits --- "Your Ideas in Silicon"
Design Contracting in Bipolar and CMOS - Analog, Digital, and Scan ICs


More information about the Dirvish mailing list