[Dirvish] Before doing checksum=1, read this

Keith Lofstrom keithl at kl-ic.com
Thu Jun 21 14:10:06 UTC 2007


On Wed, Jun 20, 2007 at 11:21:42PM -0700, Keith Lofstrom wrote:

> So, a request!   Could some of you try adding the line:
> 
> checksum: 1

After too little sleep, I woke from a nightmare about hard drives
overheating.  Checksum=1 /really/ thrashes the source and the
target hard drive; huge numbers of head seeks for a long time.

If either drive is poorly ventilated, something like this can
heat the whole drive quite a bit.  My Thinkpad laptop has the
hard drive located well away from the flow of cooling air, and
I observed that the "drive corner" of the machine was getting
rather warm.  I didn't think much of it.  Then I noticed that
I had a couple of unexplained system lockups on that machine
since yesterday's checksum=1.  I didn't think much of that,
either.  

But put all that information together, and it made me realize that
a hot hard drive may physically shrink or grow, and change mechanical
tolerances.  And that can mean that writes to the drive, after
reading all that data, may have errors. 

If you have the smartd drive monitoring enabled, and properly
equipped hard drives, you should do a "smartctl -a /dev/hdX"
(where X is the drive) on your source and backup drives, paying
particular attention to the "Temperature Celsius" line ( line 194
in my output ).  Numbers like "100" and "253" are nonsense, 
numbers like "30 and 51" are accurate.  This works for all my
Seagate drives, not so good for the Maxtor drives or IBM or
Toshiba drives.  My Seagate laptop drive reached 53C, while
the backup drive reached 44C.  Most of my backup drives are
well ventilated (in swap cages with individual fans ).  So the
backups should be safe, but I will pay close attention to the
source drives.

After I send this message, I will do an fsck on all the drives
I thrashed yesterday, so I will be offline for a while.  

The bottom line:  don't do "checksum: 1" tests on poorly ventilated
drives until we have some more data about temperature consequences.
Please do "checksum: 1" on well ventilated drives.

Keith

-- 
Keith Lofstrom          keithl at keithl.com         Voice (503)-520-1993
KLIC --- Keith Lofstrom Integrated Circuits --- "Your Ideas in Silicon"
Design Contracting in Bipolar and CMOS - Analog, Digital, and Scan ICs


More information about the Dirvish mailing list