@bazkie @nixCraft
Wikipedia offers free copies of all available content! Yes, you can legally download the entire Wikipedia in your language for offline usage here:
https://en.wikipedia.org/wiki/Wikipedia:Database_download
Feed this to your local running LLM or use other free tools to make it conveniently usable.
@tjunker @bazkie @nixCraft and yet still the detachment between source and answer is real.
I know LLM are useful for this *BUT* i personally fear we will lose our sources when we separate where the source is edited/curated and where it is read.
Having a clear pointer to wikipedia and its edit button invites upkeep, addition and change. Without people will consume the knowledge but never feeding back
@nixCraft I don't agree with that narrative. ChatGPT didn't kill Stackoverflow : managers of SO did.
After ChatGPT launched, SO team decided that from now on, every question and answer on the site were going to be used to train a local LLM model. Everybody was angry at this, and everybody left. Including me.
I posted questions and answers on SO for more than ten years. After that decision from the team, I just logged out for the first time in years and never came back.
@rusty @nixCraft Similar story. I joined in 2009, wrote over 5,700 answers, received over 300k reputation, currently (still) ranked #215 on the site.
I haven't answered a single question since 2024, and it was because of how the team treated contributors.
It might be true that AI killed SO. Maybe nothing could have saved it. The way people are treated when they ask questions is terrible.
But the reason *I* left was because of how SO leaders reacted to AI, not because AI replaced it.
@rusty @nixCraft The funny thing is that the actual, specific thing that caused me to leave wasn't directly about AI. They were desperate to do *something* and kept messing with the UI, moving things around and "experimenting" till I couldn't find interesting questions any more. It broke my habit of going to the site every time I had a few minutes to kill. And breaking my habits made me rethink whether I even wanted to be there.
Folks should remember that when they feel they must A/B something.
I still don't get why everyone is still carrying that false "AI killed Stackoverflow" framing?
Stackoverflow was slowing as most RTFM questions were already asked, so anyone able to search didn't have to ask a new one - AI just made the finding those answers ahead of time much more likely.
SO (and other SE sites) are simply down to a higher degree of real questions.
@Computeum @nixCraft The graph includes deleted questions but does not include any context to contrast with answered, duplicated, or spam questions. It's like it's designed to carry only one narrative.
Stack Overflow was never meant to be where every single person asks the same "How do I add an item to an array in [language]?" except maybe one person once per language. It's where you go to find the answer but that involves surfacing existing answers via search, not asking it again.
@nixCraft There'll be few tears shed by people who were starting out with a new technology, wanted a place where they could ask a question to clarify their understanding, and then got told:
They could have remained relevant if they provided genuine value and true knowledge, not rewarded gatekeeping.
@rumbles @brendan @nixCraft or my personal favorite:
the top answer from 5 years ago has 637 upvotes because it solved a widespread but very temporary bug that happens to also show the same error message, without explaining anything.
below, there are 50+ "same here, tried X, idk" "answers" scattered through the years with various scores.
two thirds of the way down, the correct answer sits with 19 points, one comment that says "this is the answer", and the other says that it doesn't work on their hair dryer's WiFi.
Stack Overflow was killing itself through a decade of extreme stagnation and user hostility. do not mourn its passing.
@brendan @nixCraft I contributed to Stack Overflow so much I even got a free T-Shirt from them. On the other hand, I stopped contributing because their culture was annoying me.
It makes sense that people loved LLMs: even when those were trained on those toxic comments, RLHF takes care to trim this toxicity out, leading to a "somehow better" perceived UX than SO.
@nixCraft Not exactly to your point. I'm just wonderening: Wouldn't the numbers before ChatGPT be exactly what to expect from a knowledgebase platform? As soon as there is a solid stock of answered question, the need to ask new questions diminishes.
I created my account 12 years ago and asked exactly one question, but found many others already answered.
@nixCraft I hated asking questions on SO. Every time I asked a question along the lines "How do I do X in Y without Z?" a bunch of smartarses would pile on saying "You should really do Z or Q instead".
I'll be happy to see it go. The brigading trolls can go moderate reddit or some other shithole.
Plus the block wardens downvoting within seconds of using wrong vocabulary which may remotely be interpreted as "off topic". And downvoted questions are mostly dead.
But this developed over the years.
@nixCraft Idk the wack mod stuff came in once the questions got pretty niche. For years you could answer obvious questions and score huge rep, which, imo, helped drive the site’s actual growth — the growth of its knowledge base.
Then once the knowledge base was lain and questions became more user/use case-specific (or, yes, unf, also super simple ones that were often poorly written dupes, probably b/c there’s a correlation between bad search skills & bad composition), it felt like the good-natured, selflessly motivated answerers lost interest and moved on to more rewarding tasks.
TL;DR: It wasn’t the moderation that was killing SO before AI. It was SO’s natural conclusion.
(And, sure, b/c Joel & Co. sold the company, but don’t get me started on that disaster.)
@nixCraft this is going to back fire spectacularly or things will evolve.
Who knows. There's more accessible self hosting solutions than ever before. Maybe we just end up creating a more decentralized version of the web.
Imagine getting rid of having to pay for a registered domain and having a system of automated and endlessly federated domains.
@jackemled @nixCraft
Form what I understood, the public URL of Wikipedia was getting hammered by AI agents scraping their content, to the point where their servers were going down and users had trouble accessing it. So Wikimedia provided a high bandwidth content API for LLMs for a fee, relieving the pressure on the public URL. Given that the LLMs were going to extract the content one way or another, that seemed like the best approach for Wikimedia.
https://futurism.com/artificial-intelligence/wikipedia-ai-deal
One can clearly see that the trend had started much earlier - then COVID gave it a shot in the arm, but then it just continued. LLMs only hastened that