is my hard drive dying?
Dan Egli
pluglist at plug.org
Mon Apr 14 09:58:50 MDT 2003
Brian Beck wrote:
>On Saturday 12 April 2003 21:34, Dan Egli wrote:
>
>
>>Brian Beck wrote:
>>
>>
>>>On Saturday 12 April 2003 14:39, Dan Egli wrote:
>>>
>>>
>>>>Brian Beck wrote:
>>>>
>>>>
>>>>>the extended test is reporting that everything is good to go, it found
>>>>>no problems.
>>>>>
>>>>>Brian
>>>>>
>>>>>
>>>==================================='
>>>
>>>
>>>
>>>>Then you have two possibilities remaining:
>>>>1) Your kernel is imagining the errors (not likely)
>>>>2) Your controller is returning signals that the kernel interprets as
>>>>errors (more likely).
>>>>
>>>>Can you try a different hard drive?
>>>>
>>>>
>>>I can in a couple of weeks or so, the report that I got back from IBM was
>>>that that file that I sent them showed no signs of trouble, here it is
>>>below. I'll pick up a new cable this friday and see what happens.
>>>
>>>Brian,
>>>
>>>This is in regards to the call you placed with Hitachi Global Storage
>>>Technologies. We have received and reviewed the log file created by the
>>>Drive Fitness Test. Your drive appears to be running within
>>>specifications. The drive had no errors on it, temperature was normal,
>>>and the drive does not appear to be suffering from excessive vibration.
>>>At this time, I would not recommend replacing the drive.
>>>
>>>
>>The cable is a possibilty I forgot to consider. Since it's Linux I doubt
>>it's incorrect DMA settings. Are you running in a Hardware Raid
>>environment? Scott mentions the raid controller. Basically, I'd say try
>>a new cable, failing that, try a new drive (if you can. Raid0 makes it
>>difficult to try a different drive).
>>
>>--- Dan
>>
>>
>>
>I am not using raid, and the raid connectors on the MB are not being used.
> I bought this board figuring for the next upgrade, when this board will
>become the server, The cable is pretty new, less than 3 months.
>The tech support people were pretty surprised that I was able to have logs of
>the hard drive error and that my wifes computer (also with a IBM drive) was
>showing no errors at all, both systems are running a fresh install of reheat
>9,
>I just flashed the bios to the newest and greatest so to speak, and that
>allways makes me sweat, but the errors are still there, in fact you know the
>boot up screen where you get the long list of "OK" I am getting 3 to 4 errors
>scrolling in between those "OK" messages, something that was never there with
>redheat 8, this is the second time that I have installed redheat 9 on this
>computer, i was trying to install SuSE via ftp and ended up giving up. But
>with the first install of RH9 they were not showing up on the boot up screen
>and I never even had looked at the logs.
>The drive is formated with ext3 on hda1 and hda2 with hda3 as a swap
>partition.
>
>I am thinking that the drive is at fault and I am trying to prove otherwise.
>
>Brian
>
>.===================================.
>| This has been a P.L.U.G. mailing. |
>| Don't Fear the Penguin. |
>`==================================='
>
>
I hate to say it, but it looks to me like your drive is at fault. The
problem with any hdd manufacture is that they don't want to rma the
drive if their diagnostic software returns 0 errors on it.
I seriously think the best bet to get IBM to replace the drive is to
test the system with a different drive, and if it works fine there, then
thats your proof of a bad drive. Could be a controller issue (the
integrated controller, not the controller on the MoBo). I've seen the
controller randomly send back lowsy return codes, which makes the
monitoring portion of the kernel want to blow white fluffy chunks, but
the Diagnostic software reads it fine. Heck I got a 27GB western
digitial drive here that I just upgraded (they don't want the old one
back now so I keep it for emergencies) that passes every DataLifeGuard
test 100%, but randomly will not detect, randomly locks the system,
etc... The drive is obviously bad because I have hooked a different
drive onto the same controller port, same cable port, and it's fine. But
it still says no errors in DataLifeGuard tools :>
--- Dan
More information about the PLUG
mailing list