Of course, after the power outage, one of my nodes decided not to come back. Another WD 120GB SSD bites the dust.

I've ratted the SSD out of the now dud `fluorine` node… it was set up identically aside from network, hostname and cryptography keys. I put the SSD in and it booted… I've renamed the node and updated `/etc/network/interfaces`.

Put it back on the rack, re-added it as a MON/MDS node, re-activated the OSD… and it seems all is well again.

Might need to get some new 120GB SSDs in stock since these old WD ones look to be reaching the end of their lives.

#SolarCluster #ceph

```
2025-11-01T16:30:16.382783+1000 mon.hydrogen [WRN] Health check update: Degraded data redundancy: 47/2157581 objects degraded (0.002%), 1 pg degraded, 1 pg undersized (PG_DEGRADED)
2025-11-01T16:30:19.585820+1000 mon.hydrogen [INF] Health check cleared: PG_DEGRADED (was: Degraded data redundancy: 5/2157581 objects degraded (0.000%), 1 pg degraded, 1 pg undersized)
2025-11-01T16:30:19.585879+1000 mon.hydrogen [INF] Cluster is now healthy
```

Let's see how much M.2 SSDs are worth these days. There's been two recent failures, the others will likely fail at some point so I'll get some stock in early.

Just had a look at the local suppliers… seems M.2 SATA is going the way of the dodo.

This is being used as a boot drive… data is on 2.5" SATA for which I've got 2TB Samsung 870 EVOs (and a couple of 870QVOs for emergencies).

The boot drive does not need to be big… even 120GB is heaps. It will see a number of writes though. Years ago I tried 16GB USB drives (Sandisk UltraFit) … worked great for about 6 months then they went read-only!

Umart seems to only have Silicon Power drives in M.2 SATA… $25 for a 120GB SSD, but the comments suggest Silicon Power should not be trusted for their SSDs, lots of reports of reliability issues after months of mild use.

I'm thus hearing the siren song of NVMe… use a 2.5" SSD for the boot drive, and buy a 2TB NVMe SSD to store data. I'll have to think about that.

Performance comparison… this is the 2.5" 2TB SSD (Samsung 860EVO), 6Gbps SATA interface according to `hdparm -tT`:

```
Timing cached reads: 4124 MB in 2.00 seconds = 2064.64 MB/sec
Timing buffered disk reads: 1258 MB in 3.00 seconds = 418.68 MB/sec
```

and a NVMe disk over USB 3.0 (Samsung 970EVO+ 500GB):

```
Timing cached reads: 3880 MB in 2.00 seconds = 1941.74 MB/sec
Timing buffered disk reads: 1110 MB in 3.00 seconds = 369.91 MB/sec
```

…not promising. These servers' motherboards do not support NVMe, so I'd either have to find a way to shoehorn a PCIe extender in and connect it up to a drive in a 2.5" enclosure somehow, or connect it over USB 3.0 — which we've just identified is slower than the SATA route.

Possible contender… https://au.rs-online.com/web/p/hard-drives/2711617

A bit on the pricey side for a 128GB SSD, but it's from a trustworthy vendor and supplier. However, NVMe seems to be equally expensive if I want to go with a decent manufacturer. Maybe being an industrial device might mean it'll tolerate server use better. I run a mSATA 512GB version of one of these in the tablet (Panasonic FZ-G1).

Umart sells a Crucial E100 480GB NVMe for $68, but 480GB is massive overkill for this job, I barely fill the drive as it is now:

```
Filesystem Size Used Avail Use% Mounted on
/dev/sdc1 223G 11G 211G 5% /
```

By comparison, I bought the WD Green drives for $25 a piece in 2021.

OM8P0S3128Q-A0 | Kingston Design-In Industrial M.2 (2280) 128 GB Internal SSD | RS

Buy Kingston Design-In Industrial M.2 (2280) 128 GB Internal SSD OM8P0S3128Q-A0 or other Hard Drives online from RS for next day delivery on your order plus great service and a great price from the largest electronics components

@stuartl Given the recent failure mode, one of the qualities you may care about in any future SSD purchase is, "Power loss protection."

@dwm Maybe. It was gracefully powered off in this case, I didn't just yank the power.

I know sudden power-off is dangerous for anything flash-based, but that isn't what was done here.

These are drives I bought 5+ years ago, so they might just be reaching their end-of-life, having pretty much run every single hour of those 5 years.

@stuartl Some makes of SSD also had firmware bugs that caused failure after a certain runtime. You may be unlucky enough to have one of those, but lucky enough that it's fixable post-failure with a firmware upgrade.

@dwm Yeah, I could give that a try I guess… not sure how to do a firmware update on these things.

I'm running Linux machines so whatever the process is, it'll have to be done from either that… or Windows 7, I don't have anything else.

@dwm Heh, looks like `hdparm` knows how to push a firmware blob to a drive…

I picked apart the Python script at https://github.com/not-a-feature/wd_fw_update/blob/main/src/wd_fw_update/main.py to figure out where to get the download blob from.

```
hdparm --fwdownload-mode3 /tmp/UI450000.fluf --yes-i-know-what-i-am-doing --please-destroy-my-drive /dev/sdb

/dev/sdb:
fwdownload: xfer_mode=3 min=16 max=16 size=8192
........................................................................... Done.
```

Now, the $25 dollar question, is it bricked?

```
[3364110.062863] usb 3-2: new SuperSpeed USB device number 12 using xhci_hcd
[3364110.076783] usb 3-2: New USB device found, idVendor=174c, idProduct=55aa, bcdDevice= 1.00
[3364110.076791] usb 3-2: New USB device strings: Mfr=2, Product=3, SerialNumber=1
[3364110.076793] usb 3-2: Product: USB3.0 External M.2 SSD
[3364110.076795] usb 3-2: Manufacturer: ASMedia
[3364110.076797] usb 3-2: SerialNumber: 20160500011E
[3364110.081446] scsi host6: uas
[3364110.088821] scsi 6:0:0:0: Direct-Access WDC WDS1 20G2G0B-00EPW0 0 PQ: 0 ANSI: 6
[3364110.090187] sd 6:0:0:0: Attached scsi generic sg2 type 0
[3364110.091182] sd 6:0:0:0: [sdb] 234455040 512-byte logical blocks: (120 GB/112 GiB)
[3364110.091297] sd 6:0:0:0: [sdb] Write Protect is off
[3364110.091302] sd 6:0:0:0: [sdb] Mode Sense: 43 00 00 00
[3364110.091461] sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[3364110.136287] sd 6:0:0:0: [sdb] Preferred minimum I/O size 512 bytes
[3364110.136294] sd 6:0:0:0: [sdb] Optimal transfer size 33553920 bytes
[3364110.153896] sdb: sdb1 sdb2 < sdb5 >
[3364110.156509] sd 6:0:0:0: [sdb] Attached SCSI disk
```

So far so good…

```
rikishi /tmp/wd_fw_update # mkfs.btrfs /dev/sdb1
btrfs-progs v6.16
See https://btrfs.readthedocs.io for more information.

Label: (null)
UUID: 15bd80a5-5581-4414-8209-983e859f4af4
Node size: 16384
Sector size: 4096 (CPU page size: 4096)
Filesystem size: 111.80GiB
Block group profiles:
Data: single 8.00MiB
Metadata: DUP 1.00GiB
System: DUP 8.00MiB
SSD detected: no
Zoned device: no
Features: extref, skinny-metadata, no-holes, free-space-tree
Checksum: crc32c
Number of devices: 1
Devices:
ID SIZE PATH
1 111.80GiB /dev/sdb1
```

… it might actually be working now!

wd_fw_update/src/wd_fw_update/main.py at main · not-a-feature/wd_fw_update

Updates the firmware of Western Digital SSDs on Ubuntu / Linux Mint. - not-a-feature/wd_fw_update

GitHub

@dwm … or maybe not.

```
[3364206.670387] sd 6:0:0:0: [sdb] tag#15 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[3364206.670394] sd 6:0:0:0: [sdb] tag#15 Sense Key : Hardware Error [current]
[3364206.670397] sd 6:0:0:0: [sdb] tag#15 Add. Sense: Internal target failure
[3364206.670399] sd 6:0:0:0: [sdb] tag#15 CDB: Read(10) 28 00 00 7d 96 00 00 00 80 00
[3364206.670400] blk_print_req_error: 154 callbacks suppressed
[3364206.670402] critical target error, dev sdb, sector 8230400 op 0x0:(READ) flags 0x800 phys_seg 16 prio class 0
[3364206.687903] sd 6:0:0:0: [sdb] tag#8 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[3364206.687912] sd 6:0:0:0: [sdb] tag#8 Sense Key : Hardware Error [current]
[3364206.687924] sd 6:0:0:0: [sdb] tag#8 Add. Sense: Internal target failure
[3364206.687952] sd 6:0:0:0: [sdb] tag#8 CDB: Read(10) 28 00 00 7d 96 28 00 00 80 00
[3364206.687955] critical target error, dev sdb, sector 8230440 op 0x0:(READ) flags 0x80700 phys_seg 16 prio class 0
[3364206.687987] sd 6:0:0:0: [sdb] tag#9 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
[3364206.687990] sd 6:0:0:0: [sdb] tag#9 Sense Key : Hardware Error [current]
[3364206.687993] sd 6:0:0:0: [sdb] tag#9 Add. Sense: Internal target failure
[3364206.687995] sd 6:0:0:0: [sdb] tag#9 CDB: Read(10) 28 00 00 7d 96 a8 00 02 00 00
[3364206.687997] I/O error, dev sdb, sector 8230568 op 0x0:(READ) flags 0x80700 phys_seg 64 prio class 0
```

No worry… let's try the other failed disk. It wouldn't even boot earlier today.

@dwm

```
rikishi /tmp/wd_fw_update # fdisk /dev/sdb

Welcome to fdisk (util-linux 2.41.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

fdisk: cannot open /dev/sdb: No medium found
```

…ookay, second failed disk is proper stuffed now. Never mind, they were going to the e-waste bin anyway.