Mastodawn

Jason Punyon Jul 15, 2024

A tiny joy from the latest Stack Overflow footgunorama is when the mouthpiece chimed in to question what I meant by "you" (and stridently affirm his role as mouthpiece)...

https://meta.stackexchange.com/questions/401324/announcing-a-change-to-the-data-dump-process/401335#comment1338927_401335

Announcing a change to the data-dump process

UPDATE August 29, 2024 This new post has updates related to various bug reports and feature requests. UPDATE August 14, 2024 Over the past week we’ve scaled up the availability of the new page and

Meta Stack Exchange

Show thread

Jason Punyon Jul 15, 2024

And then disambiguating "you" became a thing 🤣

Show thread

Jason Punyon Jul 15, 2024

Google and OpenAI are buying the Stack Overflow Data Dump from Stack Overflow for some undisclosed-so-far amount of money. Stack Overflow has promised this is the money that will, finally, flow back into the community. Their whole pitch for doing this is that they're selling the data so they can invest it in building more/better/faster/stronger community.

Show thread

Jason Punyon Jul 15, 2024

The fly in the ointment is that the Data Dump is free. It's always been free, and it will always be free. As long as you comply with the CC-BY-SA license you can do whatever you want with it. It's on the internet archive (for now), and you can still go download a copy right now if you want. https://archive.org/details/stackexchange

Stack Exchange Data Dump : Stack Exchange, Inc. : Free Download, Borrow, and Streaming : Internet Archive

This is an anonymized dump of all user-contributed content on the Stack Exchange network. Each site is formatted as a separate archive consisting of XML files...

Internet Archive

Show thread

Jason Punyon Jul 15, 2024

No one pays for free things. Companies especially LOVE free things. There is a direct free substitute for the thing Google and OpenAI are buying, and they've been using it forever already.

Show thread

Jason Punyon Jul 15, 2024

Stack Overflow needs to differentiate their product. So they're changing the public data dumps to be objectively worse. Step 1 is "Yeah you can still download same dump, you just have to do it in hundreds of separate pieces from hundreds of separate websites."

It's purely to make sure OpenAI and Google keep paying for the streamlined version of the dump.

Show thread

Jason Punyon

It won't work because Stack Overflow is dying (and this incident will accelerate it). As of the last data dump there were some 24 million questions. According to data from the API recently they get 2000 questions on a weekday, 1000 on a weekend day.

Right now the data dumps likely contain ~95% of all the data they will ever contain. There's no value in subscribing to such a data product long term.

Show thread

Jason Punyon Jul 15, 2024

Anyway, all this nonsense is why I made https://seqlite.puny.engineering last year. All the Stack Overflow data up through April 2024, same license, better format.

SEqlite

Show thread

Amin Hollon 🏳Jul 17, 2024

@JasonPunyon

Aw, you made that! I owe you a thank you, then. I used that to whip myself up a simple interface for searching those dumps locally (works with all but the largest at the moment).

https://codeberg.org/Clew/MetaStack

(and here's a demo site) https://stack.clew.se/

MetaStack

A metasearch engine to search StackExchange site dumps locally.

Codeberg.org

Show thread

Jason Punyon Jul 17, 2024

@amin So cool! Thanks for sharing :)

Show thread

Amin Hollon 🏳Jul 17, 2024

@JasonPunyon

Happy to!

The story was basically seeing your dumps and thinking they were so cool that I basically had to do a project with them. ;)

Show thread

saxx Jul 15, 2024

@JasonPunyon I‘m really interested in the looming downfall of SE. Do you know of any good written analysis what went wrong the last years?

Show thread

Jason Punyon Jul 15, 2024

@saxx Jon Ericson (former Community Manager) wrote a lot last year starting in May with "How's Business at Stack Overflow?" https://jlericson.com/

Jon Quixote

In which the author considers whether a community is more like a giant or more like a windmill.

Jon Quixote