229 Followers
27 Following
3.9K Posts
Dan Kaminsky once said I know how computers work.
Pronounshe/him

@dalias @gloriouscow SSDs stored in a controlled climate will generally also also retain data for decades. My point is the data loss concern for unattended flash is overblown (much like concern about flash wear), and other media types are also risky to leave unattended.

Basically anything but burnable optical media will last at least two years stored in a climate which doesn’t kill people. Store it with climate control, and it’ll last a decade, maybe two. Tape might last twice as long as an SSD, but it won’t reliably make it to the next bracket up.

@dalias @gloriouscow Even tapes don’t survive well unattended. The base gets brittle over time and fails. They last better than flash or spinning drives, but not *that much* better. I’ve been asked to digitize 30-year-old VHS and reel-to-reel tapes which just crumbled when I tried to examine them.

Long-term data preservation takes active work. If you want something to last decades, you need to build it to be fault-tolerant, you need to test it regularly to catch failures before they exceed the fault tolerance, and you need to fix the failures.

@mttaggart @glyph @mikix It may be useful to consider music licensing. Rearrangements, remixes, and records which sample other records are considered derivative works of the original record. Covers and parodies are considered unique records derived to varying extents from the original songwriting.

The Kickstarter launches this October. The book is in your hands December. Sign up for pre-launch notification now and join me in the long game...

Khumalo Kickstarter pre-launch link!

https://www.kickstarter.com/projects/obsidiansky/khumalo-tales

Coming soon: KHUMALO TALES

Book 3 of the Khumalo Trilogy

Kickstarter

@mttaggart @fhekland @cwebber This is accurate, yes. Illicitly acquired code works the same way: you don’t hold the copyright, so you don’t have the ability to license it to others.

There is an open question of what happens when the LLM emits a verbatim chunk of code against that code’s license terms. For example, if you told an LLM to implement ZFS’ spa_activate, it’s extremely likely to emit verbatim chunks of CDDL code without the attribution required by the license. A tool can’t be liable for the infringement, but does the liability rest with the company which included CDDL code in the training corpus, or does it rest with the user who didn’t verify that the output doesn’t infringe preexisting copyright?

@nasser @mcc They’re both terrible, but only one is eating through a straw right now.
@jenniferplusplus One thing doesn’t seem to line up for me: the mass layoffs. Those inherently leave the labor idle, as layoffs explicitly cede control over the laborers.

@nazokiyoubinbou @gloriouscow Mostly agreed, but simply powering an SSD doesn’t help for the majority of devices. They’re not refreshed like DRAM. Instead, SSDs made since roughly the advent of wear leveling store data with some error correction data. As blocks are read, the controller measures how much of the error correction capacity is used to clean them up. Above a certain threshold, the data is rewritten (probably to a different page, as decided by wear leveling).

*Some* SSDs have a sort of patrol scrub which reads the whole drive in the background over the span of a few days. Most don’t do this.

@gloriouscow Most current hard drives will also be bricks over that time span. And most tapes. Outliers will exist, but that’s survivorship bias.

I think the prevalence of encryption is also likely to reduce the recoverable data from today.

@gloriouscow If you’re planning long-term enough that flash data retention is an issue, then it’s not like spinning drives are long-term storage either. The only substantial step up from flash is a live system which you regularly scrub and service.