Mastodawn

Baldur Bjarnason May 12, 2024

One of the things that the Stack Overflow brouhaha demonstrates is that it doesn’t matter if a service was founded by people trusted by the community (Atwood and Spolsky) and was broadly community-led. If it’s a VC-funded startup, they will sell out their users at some point.

Show thread

▚ pixelistik ▗▚▗▘May 12, 2024

@baldur Trust is all fine, but what counts is the license. In this case, our content is under CC-BY-SA. What seems relevant to me:

- using the content for AI training does (unfortunately?) not trigger the attribution requirement ("fair use", bla bla)
- it should be feasible to pull of a fork of Stack overflow, with a legal copy of all existing content

Show thread

Daniel Lakeland May 12, 2024

@pixelistik
If you can get the content. There's no obligation for them to transfer it to you. They could say you're violating terms of service for your bot crawling their site and shut you down no problem. They've enclosed a commons and will defend it.
@baldur

Show thread

▚ pixelistik ▗▚▗▘May 12, 2024

@dlakelan It seems that there is a dump file maintained by archive.org - enabled by exactly the CC license. https://archive.org/details/stackexchange

Stack Exchange Data Dump : Stack Exchange, Inc. : Free Download, Borrow, and Streaming : Internet Archive

This is an anonymized dump of all user-contributed content on the Stack Exchange network. Each site is formatted as a separate archive consisting of XML files...

Internet Archive

Show thread

Daniel Lakeland May 12, 2024

@pixelistik

Awesome, now we just have to figure out how to defend the archive.org from the statist bullshit when the FBI comes for them for violating copyright law. 😩

Show thread

Jason Punyon

@dlakelan @pixelistik This'll be easier to use. https://seqlite.puny.engineering

Stack Exchange Data Dump : Stack Exchange, Inc. : Free Download, Borrow, and Streaming : Internet Archive

SEqlite