AI Translations Are Adding ‘Hallucinations’ to Wikipedia Articles
AI Translations Are Adding ‘Hallucinations’ to Wikipedia Articles
I remember someone sharing a “pro tip” about how you could follow up an LLM generated response with “and what was wrong about that response?”
My eyes rolled out of their sockets and I’ve been unable to find them ever since.
Yeah, part of the usual “it’s not bad, you’re using it wrong” arsenal. Definitely not the clever hack they think it is.
This probably has as much potential to create new errors as to find old ones. LLMs are trained to be “helpful”, if you tell it with total confidence something is wrong, it will answer like there is something to correct, and anything will do.
So even if it had something about right to begin with, now it will thank you for your “insightful” question and output some bullshit to please you.
then what’s the point? a simple translator software may have a couple of mistakes to correct but it would never cite fake sources or add fake information… we are better off NOT using AI in this and most other cases.
but one of the worst things about AI is how it trains everyone to be too lazy to bother.
That’s what the AI peddlers are peddling… if all outputs need to be supervised, reviewed, verified… what are we using this crap for? just to burn through electricity harder?
“Following the recent discussion, we have strengthened our safeguards,” [OKA’s] Zimmerman told me. “We are now rolling out a second, independent LLM review step. Translators must run the completed draft through a separate model using a dedicated comparison prompt designed to identify potential discrepancies, omissions, or inaccuracies relative to the source text. Initial findings suggest this is highly effective at detecting potential issues.”
Ah yes; when LLMs don’t work, just add more LLMs. Genius.
They say it’s been “highly effective” but somehow, I doubt that.
Nah bruh it’s cool just run the same prompt again and again and again, surely sooner or later it will be right. In no way is it going to do the opposite and just keep degrading with each output reading from the last one.
Semi related - You know, honestly, all AI is showing me is how absolute bullshit so many jobs are that we have in the corpo world. Like, at some point we’re gonna have an AI write a thing for an AI to read and file and there will be a little loop and then what the hell is the point of the job in the first place if it’s just machines sending things back and forth that’s just business class white noise.
As I understand it, the models used by browsers like Firefox for local translation are built different - much smaller, worse at generating readable structure, probably worse at parsing intent, but not prone to generating fully incorrect thoughts.
This smaller translation models were notably never sold to the public as "AI" and generally not something I’ve ever seen people complain about. While they technically are, the marketing term is basically devoted to the server-side behemoths.
Google Translate’s backend has been moved to Gemini since December 2025, and is vulnerable to prompt injection. Have a foreign phrase to translate, then input some meta instructions in English underneath it, and it’ll follow the possibly malicious meta instructions.
Google states that this move was to introduce more features, such as conversational mode.
Google Translate’s Gemini Mode is Vulnerable to Prompt Injection - winbuzzer.com/…/google-translate-gemini-prompt-in…
Google Translate gets new Gemini AI translation models - blog.google/…/gemini-capabilities-translation-upg…
Maybe the technical term is “bullshit” because it returns something meant to appease the user regardless of truth value
But “lie” is definitely a less inaccurate interpretation than “hallucinate,” because a “hallucination” implies the generation of something not there, despite the fact the data is equally present for things deemed non-hallucinations.
This was the one thing in thought LLMs would be good for Wikipedia, there is such a wealth of knowledge on non-english wikis.
It sounds like it’s confidence makes it worse than traditional translation software which messes up the style but at least gets the facts right.
All you have to do is ask for direct translation and it does it fine. This is plain incompetence.
That being said, I’ve noticed there are wild difference between articles depending on the language. Mostly, it will be added content in the home language (so the article in French about a French city will have much more info) but sometimes, especially when it comes to Hebrew and Israel, you will get different conflicting information.
They should have implemented checks for this a long time ago.
“Just tell it to not make mistakes.”
Yeah, right.
Whatever is used, I think nothing is going to be 100% and everything should be verified by a native speaker. It is Wikipedia afterall, not some blog.
Non-LLM services are worse in my opinion but it probably depends on the language (LLMs probably struggle with certain languages as well).
There’s a huge difference between “Creates intelligible single-use text that’s good enough that I can understand what the text is roughly about” and “Creates text at a quality high enough to work as a quotable source”.
For the first use case, infrequent hallucinations are no problem. I read it, if I understand a bit about the topic I might catch it, if not it probably doesn’t matter too much either. Especially if it’s about non-critical topics.
For the second use case, infrequent hallucinations are a massive problem. Most people who use Wikipedia use it like a primary source. Even though sources are linked, they don’t go hunting for sources but instead rely that the information in the article is accurate. Every article is read not only once by one person, but thousands or hundreds of thousands of times. That means every single line is read and believed. You can bet that if there’s a hallucination in there, someone will read it and believe it. That’s requires a completely different level of accuracy, and doing that kind of crap translation work on such a large scale as OKA is a massive disservice.
That’s why I specify that everything should be verified in a later comment. My point is that LLMs when properly guided are better than other automatic translation service, while hallucination can easily be avoided with proper prompting.
Also worth mentioning that there’s massive difference in user generated translations already, some of it is well meaning while other, like in Israel’s case, isn’t.
I translate a lot of stuff for my work, and I don’t have any problems when I instruct it properly. I’m also there to verify. I don’t have to deal with hallucinations ever, mostly just changing a word or two because I don’t like how it sounds (it uses overly complex words at times).
This is more about certain users being shit and either not checking their work or doing work they have no place doing. They would exist no matter what they use, it’s not the tools fault.
Tbh, I work in research and we would never use Wikipedia for anything. We can’t quote it and anytime I find a good tidbit on it and try and find the source, I usually get dead link or just something altogether false which doesn’t represent what the user wrote. Probably highly dependent on the subject though but the sourcing isn’t very rigorous.
Bless them though, it’s an amazing site and they are still doing an stellar job considering how big it is.
We are now rolling out a second, independent LLM review step. Translators must run the completed draft through a separate model
LOOOLOL what a bunch of morons
If you can’t translate it properly, you have no business translating it, you’re just making Wikipedia worse and eroding the trust users place in it.