One of the things that the Stack Overflow brouhaha demonstrates is that it doesn’t matter if a service was founded by people trusted by the community (Atwood and Spolsky) and was broadly community-led. If it’s a VC-funded startup, they will sell out their users at some point.

@baldur Trust is all fine, but what counts is the license. In this case, our content is under CC-BY-SA. What seems relevant to me:

- using the content for AI training does (unfortunately?) not trigger the attribution requirement ("fair use", bla bla)
- it should be feasible to pull of a fork of Stack overflow, with a legal copy of all existing content

@pixelistik
If you can get the content. There's no obligation for them to transfer it to you. They could say you're violating terms of service for your bot crawling their site and shut you down no problem. They've enclosed a commons and will defend it.
@baldur
@dlakelan It seems that there is a dump file maintained by archive.org - enabled by exactly the CC license. https://archive.org/details/stackexchange
Stack Exchange Data Dump : Stack Exchange, Inc. : Free Download, Borrow, and Streaming : Internet Archive

This is an anonymized dump of all user-contributed content on the Stack Exchange network. Each site is formatted as a separate archive consisting of XML files...

Internet Archive

@pixelistik

Awesome, now we just have to figure out how to defend the archive.org from the statist bullshit when the FBI comes for them for violating copyright law. 😩

@dlakelan @pixelistik @baldur They freely offer full downloadable dumps of all the content on the site, I believe, so that at least I don’t think would be a problem.

@pixelistik @baldur

using the content for AI training does (unfortunately?) not trigger the attribution requirement

No matter what Creative Commons are hallucinating, SO/SE are sidestepping this entirely by relying on the commercial dual licence in the ToS which they sneakily did not limit to as needed to run the platform itself.

@mirabilos @baldur Yes, that's true, so my first point may be irrelevant. I still think that the second point (an open license as an "escape hatch") is a valid option. And I somehow prefer the legal perspective over the emotional discussion.

@pixelistik @baldur I don’t understand what you’re asking/saying?

There’s a public data dump of SO/SE under CC-BY-SA which people can use.

(Codidact have imported some sites early on but later found that untenable; active Q&A sites tend to work better if they only have their own, active, content apparently. @amin has done something to search these dumps, and I’d not mind having that separate.)