SD cards are the literal worst.

they've expanded to be the size of small hard drives, and devices like the rpi keep using them as boot media, but they:

- use garbage tier low endurance flash cells internally
- have little to no overprovisioning for wear
- perform only the most basic wear levelling
- have no protocol level integrity checking
- have few internal error correction features, if any
- decay comparatively quickly without patrol scrubs
- do not perform patrol scrubs
- cannot do PLP

I've been mad about this pretty much forever. Don't use SD cards for stuff where there's literally any other option.

something I've been thinking about a lot is engineering panic signals into embedded stuff.

put a rail supervisor on the incoming supply, and leave enough capacitance on the low side of your buck(-boost) reg to keep the device running for a few milliseconds when the power is pulled. treat the supervisor IC's PG output as a panic signal and have it trigger an IO interrupt on your MCU/SoC which tells the filesystem code to ensure integrity, flush, and halt IO.

we think of PLP as being a fancy enterprise feature on PCs, but on embedded stuff it's actually legit important 'cos the user often just yanks the power.
@gsuberland it’s true! I know about this and I still yank the power sometimes before I realize it
@mirabilos @gsuberland Power Loss Protection. Basically a flush and sync now panic signal with a few ms (or more) energy buffer.

@gsuberland That and ECC RAM.

Doesn't help that as laptops have gotten popular devs have gotten lazy at filesystem power loss support.

@gsuberland Yep this is something I've started to design into my architecture, prototyping on the trigger crossbar and then scaling to a next-generation system with more capabilities on my next project.

Luckily there's not a whole lot to worry about in terms of data integrity because MicroKVS is a) seldom written and b) designed to not corrupt data if interrupted at any point.

At some point I need to build a board where I can test that assumption: have one MCU that controls power to another and the target is just constantly booting up, doing KVS reads and writes, and being randomly reset/power cycled.

And see if I can make it break.

@gsuberland Why low side? High side seems better wrt brownout detection.
@dascandy rail supervisor on incoming supply is high side?
@gsuberland What if you used ZFS? Honest question.
@apicultor someone else asked the same. it would give you better error detection and correction, and scheduled scrubs would help resolve the decay issues, but those scrubs also increase the write wear rate so you're still left using media that will die pretty quickly.
@gsuberland Are you sure that scrubbing causes writes (other than to update the last scrub timestamp)?

@apicultor what I mean is that if the goal is to get around bitflips and bitrot and write wear issues with ZFS, the correction itself will cause more write wear.

it's better than not using ZFS, but it accelerates the overall inevitability of complete failure.

@gsuberland Ah, legit, yes.

What about SD cards that don't suck β€” as in, ones that have their flash configured in SLC mode so you get bonkers durability like 5K TBW (for the largest ones, reducing proportionally by capacity):

https://www.transcend-info.com/embedded/product/embedded-memory-cards/usd230i

https://www.transcend-info.com/embedded/product/embedded-memory-cards/usd240i

Not quite as impressive, but still ~2.8K TBW if I did my math right (it's expressed in hours of HD video at 26 Mbit/s):
https://www.transcend-info.com/product/memory-card/usd350v

USD230I | microSD Cards - Transcend Information, Inc.

Transcend's industrial-grade USD230I microSD card operates in SLC mode for superior performance, and comes with wide-temperature (-40Β°C to 85ΒΊC) support.

@apicultor sure, but the better solution is just to not engineer systems that need you to very specifically pick special cards. an SD with long write longevity still only solves a few items off the list.
@gsuberland I wish they shipped with good eMMC onboard.
@gsuberland
Even then if the sd cards lie when you send the flush command, it won't help. We've had this problem for years with ata drives trying to win benchmarks.