Mastodawn

Dave Anderson Aug 12, 2025

Digging into the drive in my NAS that faulted, I'm reminded that magnetic hard drives are preposterously magical technology.

Case in point, using Seagate's tools I can get the drive to tell me how much it's adjusted the fly height of each of its 18 heads over the drive's lifetime, to compensate for wear and stuff. The drive provides these numbers in _thousandths of an angstrom_, or 0.1 _picometers_.

For reference, one helium atom is about 49 picometers in diameter. The drive is adjusting each head individually, in increments of a fraction of a helium atom, to keep them at the right height. I can't find numbers of modern drives, but what I can find for circa ten years ago is that the overall fly height had been reduced to under a nanometer, so the drive head is hovering on a gas bearing that's maybe 10-20 helium atoms thick, and adjusting its position even more minutely than that

This is _extremely_ silly. You can buy a box that contains not just one, but several copies of a mechanism capable of sub-picometer altitude control, and store shitposts on it! That's wild.

Anyway my sad drive apparently looks like it had a head impact, not a full crash but I guess clipped a tiny peak on the platter and splattered a couple thousand sectors. Yow. But I'm told this isn't too uncommon, and isn't the end of the world? Which is, again, just ludicrous to think of. The drive head that appears to have bonked something has adjusted its altitude by almost 0.5 picometers in its 2.5 years in service. Is that a lot? I have no idea!

Aside from having to resilver the array and the reallocated sector count taking a big spike, the drive is now fine and both SMART and vendor data say it could eat this many sectors again 8-9 times before hitting the warranty RMA threshold. Which is very silly. But I guess I should keep an eye on it.

Show thread

Andrew Zonenberg Aug 12, 2025

@danderson Personally I'd retire the drive under the assumption that particulate generated from the impact will likely contaminate it and make future damage more likely.

I aggressively replace drives at the first sign of trouble. Any increase in failed sector count is enough for me to no longer trust it.

Show thread

Dave Anderson Aug 12, 2025

@azonenberg Yeah after more thought, it seems okay right now, but even if I accept that a few sector reallocs is a fact of life on modern drives and broadly fine, >7k reallocated is outside my comfort zone.

Then I got carried away, and so now the new server is getting a fresh 160TB worth of drives, and once the data's migrated over, the current pool will get dismantled and the remaining healthy drives within redistributed to the two backup NAS pools, so that they vaguely keep up with the growth in the primary.

Show thread

Andrew Zonenberg Aug 12, 2025

@danderson I retire drives after any increase in reported bad sectors after deployment.

The rationale is that you'll have some number of factory defects that are relatively stable and not going to worsen, but new defects appearing later on are concerning: they suggest particulate contamination, some sort of electrical fault, ESD damage, age related flash bitcell damage, etc. Any of these could potentially affect many other storage locations in the future.

Show thread

Dave Anderson Aug 12, 2025

@azonenberg yeah it's fair. I'm somewhat blessed that this is the first time I've had to think about my policy, because all my drives to date have had a perfect 0 defects, or went from "fine" to "dead" basically instantly, which made the decision easy.

This event where the drive took a big hit but survived and where the smart data says even the concerning amount of sector reallocation only consumed 13% of the factory spares (RMA threshold is 90% of spares used), made me wonder. Especially since a replacement is $400, if I could persuade myself that I understand what the issue was and feel good that it's not a predictor of future sadness, ...

But yeah, a friend with deeper knowledge of hard drives (from working on them at cloud scale, where you get more insight from the vendors), who is usually somewhat sanguine about having a few reallocated sectors here and there as the drive ages, said they'd consider replacing this drive because that drive head feels like it might not be long for this world under load. So... yeah.

Show thread

Andrew Zonenberg Aug 12, 2025

@danderson Yeah especially in a case like this where you may have sustained a high energy impact it's a major concern that there could be abrasive particles scattered all over the disk surface waiting for you to hit them, causing a cascading failure.

Show thread

Andrew Zonenberg

@danderson It also depends on how much redundancy and resilience you have in your infrastructure.

I'd be a lot more willing to run a questionable drive in Ceph BlueStore with 3N replication and end to end checksumming than as say a laptop hdd with no fault tolerance.

Show thread

Will Glynn Aug 12, 2025

@azonenberg @danderson This is exactly my strategy.

I have a handful of HDDs scattered about in low- or no-redundancy applications. If any of them sneeze or cough, they get sent to a Ceph cluster instead, where most of my HDDs already are. Many go on to live long service lives! Others do not. Regardless, it's no longer a data loss concern.

Show thread

Dave Anderson Aug 12, 2025

@azonenberg Yeah, I have quite a lot of redundancy: the drive's in a raidz2 pool, and all the data within gets replicated to a separate onsite backup NAS (also raidz2), and a critical subset also gets replicated to an offsite NAS (raidz1 - subset just because I haven't gotten around to shipping more drives there and growing the pool yet).

So, in terms of risk, I have a _lot_ of redundancy to go before this suspect drive causes data loss. But otoh, for Reasons, I discovered the other day that my primary had a faulted drive for a month before I noticed, and my onsite backup brained its UEFI nvram and won't boot any more, and that the offsite backup had filled up and wasn't replicating recent changes any more 😬 So, in practice I'm spending a fair bit of my redundancy on "I can neglect the system for a while due to mental health and even if it all bitrots it's still like N+1.8 overall".

I did do several full ZFS scrubs and also an extended smart test, and they all came back clean (no increase in reallocations, no motion in other early failure indicators)... But without knowing a lot more about modern drive physics and firmware, I don't feel I have enough information to make a confident call to risk it, so to speak.