1

I had an HDD failure in my Linux RAID (Flashing red light). I pulled it out and on reboot I was forced to run fsck manually and repair some errors in the remaining filesystem.

I was trying to find out some extra info on why it failed and plugged it into my Windows PC. When I plugged it in I got a message to initialize it in the disk manager. I was planning on using CrystalDiskInfo or HDTune to get the S.M.A.R.T. data. It showed up green, but it has a count of 1 under "Reported Uncorrectable Errors". I decided to format it to get more info. I did the full format, not quick and didn't get any errors.

I then loaded up HDTune and did a fill scan but didn't find any problems. I know this drive has a lot of hours, but I am more interested in the principal of the issue. Ignoring the hours, why would this drive fail in the RAID, but then operate normal?

After these scans, is there a reason not to return it to service?

enter image description here

Alan
  • 553

1 Answers1

2

The SMART data contains a lot of proprietary information that can be difficult to decipher. The problem with SMART data is that unless there is something "flagged" its almost impossible to actually know what's really going on with the drive.

A failing drive does not necessarily mean it will have a faulty SMART status.

Most likely the drive is starting to exhibit bad sectors that are being reallocated which are causing timing issues in your RAID. Or, there are other problems that RAID driver is detecting - like read failures, write failures or long access/seek times.

To answer your question, NO, the drive should not be trusted and it should not be returned to service. I have personally seen this same behavior many times, and every time I returned the drive to service it was kicked out again within months. I no longer do that.

Ignoring any possibility of a wiring issue, or RAID / SATA controller problem, the drive is being kicked out because it has problems. Drives are cheap, and being RAID is often used in production systems, it's not worth taking a chance you'll lose two or more drives and your data.

Appleoddity
  • 3,860
  • 2
  • 13
  • 35
  • I had not considered the timing issue where maybe the drive isn't spinning / seeking quite as fast as the others because of it's age and while that wouldn't be an issue if it were a solitary drive, it may be an issue as part of the RAID. I won't be trusting it again - but I had a bunch of similar drives on the shelf and wanted to know what I could do to anticipate which were the best to use from the pile. – Alan Dec 20 '17 at 16:03