Should I replace my SSD when it has 8 KB of bad sectors?

Question

I have 8 KB in bad sectors. When I first found it I didn't do anything about it and checked 2 months later, but the number hasn't gone up, it remains 8 KB.

Free space verification is complete.
Windows has checked the file system and found no problems.
234326015 KB total disk space.
 156413580 KB in 198389 files.
    185808 KB in 38963 indexes.
         8 KB in bad sectors.
    637899 KB in use by the system.
     65536 KB occupied by the log file.
  77088720 KB available on disk.
  4096 bytes in each allocation unit.

58581503 total allocation units on disk.
  19272180 allocation units available on disk.

I also included CrystalDisk information. What should I do? I don't really want to change the disk, but is it still necessary?

For the bad sectors, run ChkDsk /R /OfflineScanAndFix (will take some time to run). To piggyback on @davidgo's answer, you may want to consider creating regular WIM backups of the partitions on that drive — JW0914, Dec 18 '21 at 16:03
How did you get the report you show at the beginning of your post, that reports 8k bad sectors? What command, how did you run it, etc.? — Swiss Frank, Dec 19 '21 at 19:18

score 30 · Answer 1 · edited Dec 20 '21 at 14:26

30

The SMART values (ie Crystaldisk) say the drive is OK, but not great. (Wear level indicator is 33%) This is a lower cost older model drive, and you are getting what you paid for. (It was released 4-5 years ago)

I don't think its necessary to change the disk yet, but if you don't already have a robust backup strategy, now is a good time to implement it.

edited Dec 20 '21 at 14:26

Glorfindel

4,099

answered Dec 18 '21 at 06:25

davidgo

70,654

score 12 · Answer 2 · edited Dec 18 '21 at 19:40

12

Make sure you have a good backup of any important files on that disk.

Then do a chkdsk /R /OfflineScanAndFix as suggested by JWO0914 in the comments.

If that causes the disk to fail: Now you know for certain it is bad, but at least you have a backup.

If the disk passes check the SMART data in CrystalDisk info again. If those got worse by the chkdsk repair the disk is about to die. Replace it asap. It can die completely anytime. (And Murphy's Law says it is going to happen on a very inconvenient moment.) Until you have it replaced: BACKUPS!

If the SMART data hasn't gotten any worse by the repair you're OK for know. Keep making frequent backups (just in case) and check the SMART numbers occasionally. If they start going up consider the disk broken and replace it.

Main issue with SSD's is they can go from OK to totally broken in a heart beat. On a spinning disk you can usually still recover a large part of your files, but SSD's frequently go completely dead when they fail.
So I can't stress the importance of backups enough!!!

edited Dec 18 '21 at 19:40

Roger Lipscombe

2,303

answered Dec 18 '21 at 16:43

Tonny

31,463

do you know if the "ok to totally broken in a heartbeat" is/was true as at 2017? I had understood it to be a firmware issue on first gen drives - in theory the drive should slow down, then go read-only? (not that Id trust it to...) – davidgo Dec 18 '21 at 18:08
2

@davidgo In my experience (dealing with an install base of approx 3000 computers at work) a SSD going completely dead still happens 3 out of 4 times on cheaper drives. Expensive ones (E.g.Samsung Pro drives) do better with only 1 in 4 going completely dead and slow-down or going read-only on the rest . Biggest problem is that users don't realize a drive is starting to go and don't call in to IT until the computer doesn't boot anymore. Windows giving no warning whatsoever (even though it will put a Disk error in Eventlog, it doesn't tell the user...) also doesn't help. – Tonny Dec 18 '21 at 20:11
1

thank you. Thats really useful information to me. The install base I deal with is much smaller then that, and is mainly server hardware using Pro-sumer drives or better in a RAID configuration with SMART and RAID logging monitored, with prosumer drives on the desktops as well. – davidgo Dec 18 '21 at 20:18
2

Why do SSDs fail all at once? One would expect them to gradually accumulate bad memory cells over time, just as hard drives gradually accumulate bad sectors. – Vikki Dec 19 '21 at 00:14
1

@Vikki That would make a good separate question, if one does not already exist. With the very early consumer SSDs (OCZ Vertex especially) the controller often ended up dying long before the flash memory. The mechanism may be different with more modern drives. – Bob Dec 19 '21 at 15:32
1

@Vikki: In HDDs with moving parts, wear and tear often results in gradual failure. But can be catastrophic, like not spinning up, or the electronics dying. In an SSD, there aren't any moving parts, and wear leveling generally works well to avoid bad sectors until near end-of-life write endurance. If you're approaching that failure mode, then yeah bad sectors. But if not, the early failure modes are mostly catastrophic (e.g. some random transistor somewhere, or another component, fails. The controller won't "boot", or can't send data over SATA or NVMe, or can't talk to flash.) – Peter Cordes Dec 19 '21 at 15:46
1

@Vikki a moving part can experience a failure, then "retry" and succeed. An electonic part experiencing a similar failure can attempt a retry, but its far less likely to succeed, leaving the whole device dead, – Criggie Dec 20 '21 at 00:59
1

@Bob: Asked, and almost immediately closed as "opinion-based" (what?) – Vikki Dec 20 '21 at 01:01
1

@Vikki I've voted to re-open this. If enough people follow suite we may be able to get it reopened. – davidgo Dec 20 '21 at 03:00

fraxinus · Answer 3 · 2021-12-20T20:23:16.027

2

Considering the wear-leveling layer in the SSD firmware, the only possible reason for a bad block to appear is for it to return a read error.

How it is possible:

A buggy SSD firmware. It happens more than not.
Very bad data retention, combined with insufficient error-correction data.
The SSD being a clone of a HDD that had a bad sector mark at the moment of cloning and the cloning is done at block device level and not at a filesystem level.

The last option is rather good, but usually hard to confirm.

The other two mean that the data stored in the disk is at risk and (probably) already corrupted. Depending on the data importance, the reaction may vary between "I don't care, the longer I wait the cheaper is the replacement" and "Backup immediately, transfer the data to another disk".

I am yet to see an SSD to report a bad sector. SSDs get read-only when they exhaust their spare blocks. SSDs sometimes die for good because of firmware bugs or other controller problems.

But bad sectors? Really?

One cannot get a bad block while writing - in contrast with the traditional HDD, the block gets its physical place at the moment it is written. If there happens to be a bad block, the physical write is just repeated elsewhere and the bad block is handled invisibly (it does not get into the spare pool anymore). It is reading that can reveal a block that doesn't compute to its checksum. And even then, a subsequent write to the same LBA address is as good as any other write. The OS has to be absolutely unaware of the existence of SSDs in order to mark the block as bad on a filesystem level.

On the other hand, a read error from an SSD is really a bad thing.

The data is written on the flash media with a lot of error-correcting overhead. Few flipped bits in a 512-byte block are not a big deal because the error-correcting code fixes them transparently.

A read error means that the error-correcting code was unable to rebuild the original data. This amounts to 1% or 2% of the bits flipped. There may still be free spare blocks, but if the data retention of a particular block is this bad, one should not really expect any good.

edited Dec 20 '21 at 20:23

answered Dec 20 '21 at 14:46

fraxinus

1,212

1

I've seen SSDs that wanted the OS to implement wear leveling. You better believe those disks accumulated bad sectors. – Joshua Dec 20 '21 at 19:44
OS cannot access the data structures needed to do the wear leveling. The only thing OS can do as a favor to the SSD is to issue TRIM commands at unused (from the filesystem viewpoint) blocks in the disk. This can happen either on file deletion or on a schedule for all of the empty space. It is up to the SSD firmware to move the static data in order to use the less used blocks. This happens invisibly to the filesystem and the OS because the blocks are only physically moved. Their LBA addresses stay the same. ... – fraxinus Dec 20 '21 at 19:57
... One cannot get a bad block while writing - the block gets its physical place at the moment it is written. If there happens to be a bad block, the physical write is just repeated elsewhere and the bad block is handled invisibly (it does not get into the spare pool anymore). It is reading that can reveal a block that doesn't compute to its checksum. And even then, a subsequent write to the same LBA address is as good as any other write. The OS has to be absolutely unaware of the existence of SSDs in order to mark the block as bad on a filesystem level. – fraxinus Dec 20 '21 at 20:07
1

Well if the SSD doesn't implement wear leveling at all the OS is theoretically capable of getting at said structures because they exist in the logical blocks on the disk and the SSD won't do remapping. – Joshua Dec 20 '21 at 21:23
1

@fraxinus, some very old, cheap, or specialized SSDs don't have a flash translation layer. Instead, logical block addresses map directly to physical block addresses, and two writes to the same block will always hit the same set of memory cells. In this situation, the OS is very much able to handle wear leveling itself. – Mark Dec 21 '21 at 01:45
@Mark in this case, wear leveling is pretty much impossible. – fraxinus Dec 21 '21 at 11:08
@Mark except the case when the filesystem gets the burden of the wear leveling in the first place, as is in JFFS(2) and friends. None of these FSs are used in Windows. – fraxinus Dec 21 '21 at 11:35

Should I replace my SSD when it has 8 KB of bad sectors?

3 Answers3