@qualia Unless your dose rate is extraordinarily high I would expect the live stream would be super boring.
But let it run for a year or two with good logging and I'd absolutely read a blog or paper summarizing the results
@azonenberg i'm halfway expecting to see failures almost immediately with it in as direct contact as i can muster to validate the whole idea, at which point i'll back it off, run a lap or two to make sure that excursion hasn't unduly damaged the dimms, and then pick some fixed distances of measured intensity and start walking it in forward
if it does turn out to be extremely boring, i'll definitely see about arranging a longer-term test strategy, since i have wondered about this for a long while
@qualia Well the other question is what you're irradiating (ram vs CPU vs chipset etc).
You're gonna see different natures of failures hitting caches, logic, main RAM, etc
@azonenberg true! i was specifically thinking ram, since it is, afaik, the most prominent/physically large domain where hardware error detection (if not correction) in consumer hardware is not ubiquitous. except storage, maybe? correct me if i'm wrong
this idea is acutely engendered by the firefox error report bit flip thread, if you've seen that floating around
@qualia yeah i'm not sure of the details of how to get correctable-error counts out of the ram, maybe over IPMI or something?
For the most part it just works and has been stable althoguh I'm not irradiating my hardware lol
@astraleureka @azonenberg I got my work laptop's NVIDIA card to throw a bunch of "fell off the bus" errors yesterday while trying to get Optimus switching going + it really not liking having its pstates kicked around
the error makes sense but is still amusingly evocative. no "lp0 on fire" but i'll take it
@whitequark @azonenberg @qualia there is indeed, you can find it by looking up what the Linux EDAC driver is. e.g. screenshot on this old dual socket Ivy Bridge system.
(EDAC: Error Detection And Correction)
@brouhaha @azonenberg makes sense. i had suspected as much -- i know SSDs have a whole extra region of spare blocks for wear management, but I got to thinking about the phenomena of silent data corruption and second-guessed myself
but on reflection, with how aggressively disk I/O gets cached in the free RAM of non-ECC consumer hardware.. that must be the substantial source of most of it
this might make for another interesting test, if I can wind up the single-event-upset events to a practically noticeable level -- get a small ZFS mirror going and redline it with checksummed/deterministic garbage writes & reads; see if/how often it manages to catch-and-correct itself in spite of the RAM's unreliability
@qualia @azonenberg
If the error hits non-ECC RAM written by the application before a write, and before ZFS computes the hash, then of course ZFS won't detect any error.
Similarly, if data read from the drive into non-ECC RAM gets an error after ZFS has validated the hash, then no error is detected.
I'm amazed the commodity mass-market computers have successfully* ignored this issue for so long, as DRAM error rates have constantly increased.
*for some value of "successfully"
@brouhaha @qualia I think commodity computer users are just expected to tolerate some level of instability and not complain too loudly because "that's how computers are".
People who demand serious reliability use ECC.
All of my Ceph cluster nodes and endpoints use ECC ram and BlueStore does E2E checksumming of data blocks to storage media and back so there should be no way for a SEU to cause data corruption, you'd need multiple bitflips
@brouhaha @qualia Yeah I was talking to @dlharmon a while back and he had some ideas for a RS-FEC that would give you something like 520 or 530? bits of payload per 576 bit (8 word burst * 72 bit bus) DRAM bus.
Not too useful for general purpose computing where you expect power of two cache line sizes but if you're building a router or oscilloscope or something and just making huge FIFOs, it lets you buy a few percent more bandwidth at the same PHY speed without completely throwing out ECC
@Voidhorn we are an exempt quantity household here thankyouverymuch. NRC licenses & spiceses are expensive and i have enough liabilities as it is
i've been a nuclear nerd since about second grade but actually started stewardship of responsibilites-extending-beyond-my-lifespan about ten years ago or so now
@qualia "fish plays pokemon" but it's "radium plays memtest86+"
... just me?
@whitequark would
i have an hdmi capture dongle somewhere too eee i love this bad idea
@vikxin no but i have been eyeing one. those are CsI or GAGG(Ce) (the new ones) but don't have the best resolution or stopping power, and i've learned are also prone to a lot of backscatter noise in the spectrum just due to the small size of the thing and closeness of readout electronics
i have a, iirc, 1.25" NaI(Tl) on a 3" PMT (bit excessive) on an HP preamp base, and then either a Canberra Model 35+ MCA or a Canberra 556 AIM MCA system plus three other bins of miscellaneous signal processing legos. the sample, scintillator, PMT, and preamp live inside a graded Al-Pb pig
i haven't thrown a Ba133m line at it but iirc it kinda ballparks in that 7% FWHM resolution range. its fun. it does not fit in my pocket
@vikxin well sure but i mean you do possess one way to find out
marble countertop, container of salt substitute, operating HEPA air filter; pick your NORM
@vikxin yeah the filter has to be running several hours, your house has to be pretty sealed up, and some areas simply don't have high radon. which is fortunate really. get your CO2 sensor up for a while and try again
i can pick up the ∆CPM on a metal β/γ GM probes so that little guy surely can under the right conditions