would it be rude to use a little atom board as a bitflip testbench with a wee smote of sealed radium nestled betwixt its sodimms
little brown smudge that makes memtest86+ angry
if i set up a 24/7 livestream of the side effects of a measured & controlled elevated ambient ionizing radiation field on a machine running continuous stability tests would you watch that
i would absolutely check that out
27.8%
yeah maybe idk
23.2%
wouldn't watch but findings would be interesting
31.9%
i do not care
0.8%
i actively do not want this to happen
1.5%
qualia what the fuck
14.8%
Poll ended at .

@qualia Unless your dose rate is extraordinarily high I would expect the live stream would be super boring.

But let it run for a year or two with good logging and I'd absolutely read a blog or paper summarizing the results

@azonenberg i'm halfway expecting to see failures almost immediately with it in as direct contact as i can muster to validate the whole idea, at which point i'll back it off, run a lap or two to make sure that excursion hasn't unduly damaged the dimms, and then pick some fixed distances of measured intensity and start walking it in forward

if it does turn out to be extremely boring, i'll definitely see about arranging a longer-term test strategy, since i have wondered about this for a long while

@qualia Well the other question is what you're irradiating (ram vs CPU vs chipset etc).

You're gonna see different natures of failures hitting caches, logic, main RAM, etc

@azonenberg true! i was specifically thinking ram, since it is, afaik, the most prominent/physically large domain where hardware error detection (if not correction) in consumer hardware is not ubiquitous. except storage, maybe? correct me if i'm wrong

this idea is acutely engendered by the firefox error report bit flip thread, if you've seen that floating around

@qualia @azonenberg
Mass storage devices have universally had error correction since the early 1990s. Disk drive error correction was introduced by IBM with the 3330 drive (1970) though error detection was used by earlier IBM disk systems such as the IBM 2311 (1964).
As density has tremendously increased since the late 1980s, it has become technically infeasible to make reliable disk drives without error correction. The same is true of solid state drives.
1/
@qualia @azonenberg
The error correction is internal to the drive. The drive attempts to present itself to the host as an entirely reliable device. Uncorrectable errors will of course be reported as such, but correctable errors are hidden, and only reported by diagnostic commands (e.g., SMART).
Both magnetic disk and solid state drives are dependent on "coding gain", where the error correction is used to achieve higher storage capacity than would be possible without it.
2/
@qualia @azonenberg
Unfortunately, the error correction internal to a drive is not sufficient for system-level reliability. Errors can occur at any point between the drive interface and system memory. Modern interfaces such as PCIe, SAS, and SATA have error detection across the link, but that also is not really sufficient.
3/
@qualia @azonenberg
End-to-end error control required that the host file system include error detection and correction in the data sent to/from the drive, as part of what the drive sees as opaque payload.
The ZFS filesystem, originally developed by Sun for Solaris, is an example of a filesystem that has strong data integrity checks for true end-to-end error control.
4/