[Dirvish] How I restored a laptop

Keith Lofstrom keithl at kl-ic.com
Sat Jun 24 05:45:30 UTC 2006


This is not a request for help, but a description of how I restored
a laptop hard drive - the hard way.  I thought it would be educational.


This morning, just before she left for work, my wife's Thinkpad
T30 started making a loud click-click, click-click noise, the sound
of a hard drive that has lost control of the servo.  

It was unresponsive, so I immediately shut it down, let it cool,
and rebooted.  Click-click again.   Oh joy, time to do a drive
replacement and a restore on dirvish.  

I have an identical spare drive, an IBM (now Hitachi) Travelstar
40GB IC25N040ATCS04.  It is good to keep an identical spare, so
the partition table copies exactly.  I had used it as a "dd spare"
before ( on the road, I bring along the spare drive and an
ultra-bay drive holder, and use "dd if=/dev/hda of=/dev/hdc bs=1M"
for overnight backups).

I decided to use that drive to do an "almost complete" restore
using a Knoppix disk and rsync.  I could have put the drive into
a Vipower swap cage, plugged it into in my backup server, and
done a disk copy restore, as I have done before.  However, I
decided instead to try a restore over the network, to see how
that worked out, and so I could share the results.  I made some
mistakes and it took far too long, but perhaps you can learn
from my mistakes.

The first problem:  I rotate backup target disks, approximately
daily.  The most recent drive was "J", and the drive from the day
before was "K".  When I went to restore the root partition from J,
I realized I had misconfigured dirvish on that drive to back up 
"/root" instead of "/".  Not very useful.  Fortunately, drive "K"
had a proper configuration backing up "/", so I had a 36 hour old
root partition.  That doesn't change very fast, fortunately. All
the rest of the partitions were fine.

So, boot the laptop from a live CD, first with Ubuntu 5.10 .  
Oops, no sshd on Ubuntu.  I will be needing sshd, so back to good
old Knoppix (I used an available 3.9 disk).  Knoppix makes a
slightly better recovery disk than Ubuntu;  Knoppix STD might
be even better, but I don't know that one very well.  Booted,
I do an "su - root" in a terminal window and start building
the new drive.  I decide to cheat and keep the old partitions,
so I can keep the GRUB setup and save some time.

I made a second blunder when I rebuilt the partitions using
straight "mkfs" instead of the proper "mkfs -t ext3" or
"mkfs.ext3".  Since the system is configured for ext3, this leads
to problems later on when I try to boot, specifically these errors:

  ext3: No journal on filesystem on ide0(3,7)
  mount: error 22 mounting ext3
  ...
  Kernel panic:  No init found, try ...

I encountered this problem later in this narrative, but the 
"mkfs.ext3" should have happened at this point.  If you make
the same blunder, you can fix it with a "tune2fs -j /dev/hdaXX"
for each partition, changing ext2 to ext3.

So, at this point I have a drive ready for some restores, with
empty filesystems.  However, my "pull" backup server must "push"
the data back onto the laptop drive;  I cannot get at it to
"pull" files from the laptop.  But I don't have sshd configured
for the laptop yet, so I can't do a server push to it.  So, I scp
the files in /backup/.../*0622*/tree/etc/ssh into a temporary
directory on a third machine, then use:
    scp  third:/tmp/sshfiles /etc/ssh/
on the laptop to move the files into the /etc directory in 
Knoppix.  My DHCP server assigns the laptop a fixed address on
the network and a DNS name.  Since I was going to be using root a
lot, I set the root password on Knoppix to "a".  Ugly but quick, I
assumed my internal network was secure for the duration of this
procedure.  At the end of all this, I could ssh in from the backup
server.

Ready to start moving files.  I set up a directory "/a" on the
Knoppix laptop, and mounted the /dev/hda7 root partition on it.
On the backup server, I did a cd to the laptop root tree directory
on the backup drive, and used:  "rsync -axc  * laptop:/a/" to copy
the files to the root partition on the laptop.

Unfortunately, partway through the procedure, the network failed.
I had to do an "ifconfig eth1 down; ifconfig eth1 up" on the
laptop a few times before the files were moved.  The laptop was
connected on the other side of 3 linksys switches from the server; 
I moved it to the same switch and the problems seemed to go away.
It might have been associated with Knoppix also, because I can
rsync nightly backups through 3 switches without problems.

I repeated the mounts and rsyncs for the /boot, /var, and /usr
partitions.  After each was complete, I did an "ls -R | wc"
and a "du -bxs ." for each partition, and compared results to
the same operations on the backup drive.  There were some
differences associated with not having a "lost&found" on the
backups, but otherwise they were file count and byte count
identical (after fixing the ext3 problem).  I did a "mkswap" on
the swap partition, and a "mkdir /proc" and a "mkdir /initrd".
I finally unmounted the new partitions, and did "fsck -f" on all
of them.

I did not copy my big partitions /home and /opt; instead I
rebooted from the partially restored hard drive.  After fixing
the ext2/ext3 problems noted before, the laptop came up, and
I logged in as root.  I could now finish the restore using the
laptop OS rather than slow Knoppix.  This also meant that I did
not have to enter a password each time I did an rsync.  I wrote
a shell script to do these last two rsync restores, and about 
two hours later all the remaining files were moved.  I took the
two hours off to go to a movie ("United 93" - makes restoring
a hard disk quite a trivial problem indeed).

After doing this, I encountered a third blunder - I had improperly
configured the .gnome files to look for setups and icons in 
/home/keithl rather than in /root ;  I could not unmount /home
while X was running as root.  I fixed that.  I did the ls and
du again.  All the files were transferred.   I unmounted /home
and /opt, did an "fsck -f" on them, and rebooted the machine.

It looks like the machine is back where it was;  I will let
my wife decide.  She just got home, and has email to look at!

I hope this helps you, if you ever need to restore over the 
network.  After I get some feedback from you folks, I will copy
this to the wiki so the rest of you can improve the procedure.
Yes, we need a more automated approach.  Who wants to write
some perl?

Keith

-- 
Keith Lofstrom          keithl at keithl.com         Voice (503)-520-1993
KLIC --- Keith Lofstrom Integrated Circuits --- "Your Ideas in Silicon"
Design Contracting in Bipolar and CMOS - Analog, Digital, and Scan ICs


More information about the Dirvish mailing list