#HabsburgAI* turns out to be a Thorne-y problem https://www.404media.co/elias-thorne-chatbots-llms-chatgpt-lighthouse-keeper-story/
#HabsburgAI* turns out to be a Thorne-y problem https://www.404media.co/elias-thorne-chatbots-llms-chatgpt-lighthouse-keeper-story/

LLMs including ChatGPT, Gemini and Claude are obsessed with telling stories about lighthouse keepers and clockmakers, and one character named 'Elias Thorne' has made his way from chatbots to Amazon books. Researchers are trying to discover why.
@ApostateEnglishman This spelling appears in some dictionaries because it was used in 1646. As far as I can tell it was used only _once_. This misspelling of "loyalty" was probably a typo or mistranslation by the original author. Or, it may be an error introduced in more recent times during scanning/OCR. I haven't seen a photo of the original page so I can't confirm, but I have seen this sort of glitch happen.
Nevertheless, it's bizarre to include such a rare and archaic word in spell-check dictionaries!
How did this happen? I think it may be a consequence of LLMs scraping content from online sources, using what it finds without the intelligence to discern between quality and slop, and negligent humans failing to review machine-generated content before declaring "LGTM, ship it!" Next, that LLM gets scraped by other LLMs, which indiscriminately incorporate the errors into their own AI model training corpus in an ever-worsening "Habsburg AI" feedback loop.
Thus, it seems one person's typo nearly 400 years ago has resurfaced and is contributing to AI Model Collapse.
#AI #LLM #LLMs #AISlop #HabsburgAI #AIModelCollapse #ModelCollapse #AutoCarrot
Today I learned two new ‚words‘:
- #HabsburgAI
- #BoomerBrowsing
@davidgerard among others, have long been warning that the #GenAI giants are running out of content to consume for “training”, and they're getting desperate for more.
No surprise at all: #SamAltman, being the kind of person who's so contemptuous of human #culture they think an #LLM can do it, proposes the robots will just generate more training data themselves! What could go wrong?
“Prof. Sammut says generative AI systems have big limitations because chatbots lack critical thinking.
““These systems are based on doing pattern matching, they are very good at that, but they can’t do any sort of logical sequential reasoning.”
“A chatbot can only tell you 1+1=2 because someone told it so, not because it learned how to do arithmetic.”
A leading UNSW computer scientist says a touted solution to a big problem for generative AI is better suited for other forms of artificial intelligence. AI chatbots, like ChatGPT and Google Gemini, are running out of data to eat. Generative AI models have swallowed up most of the data they’re legally allowed to process that
From the post I just retooted:
"I don't think anyone has reliable information about post-2021 language usage by humans.
The open Web (via OSCAR) was one of wordfreq's data sources. Now the Web at large is full of slop generated by large language models, written by no one to communicate nothing. Including this slop in the data skews the word frequencies."
Source: "Why wordfreq will not be updated" - https://github.com/rspeer/wordfreq/blob/master/SUNSET.md
I'm sorry, how is any of this surprising? If you put a bucket of mixed veggies through a blender & then pour the result + maybe one fresh cherry tomato into some other blenders, what would you expect & if your answer isn't "the same veggie slushie, re-blended" what's wrong with your brain?
What, exactly, did people think would be coming out of these blenders - a bucket of different, even more fresh & delicious vegetables?
https://futurism.com/ai-slowly-killing-itself
#HabsburgAI #ModelAutophagyDisorder
Finally, another episode of #grumpyoldgeeks podcast! The only podcast that actually makes me walk around with a wide grin in my face 😄
https://podcasts.apple.com/de/podcast/grumpy-old-geeks/id626471856?i=1000667243884
People are calling output from an inbred generative AI trained on a corpus that accidentally includes AI-generated inputs "Habsburg Art" or "Habsburg AI", and I approve.
I didn't expect this to be becoming a problem already.