LLM translation models are going great
The word for finger isn't even in this sentence

I think people really underestimate how fragile LLMs are for auto-translation. You can put complete garbage into it where none of the words are real words and still get out plausible-sounding "translations" just because the LLM sees it as "close" to a real sentence, and then translates what it thinks that "close" sentence is based, once again, on what seems "close".

The whole benchmarking approach really does not help with this since benchmarks rarely include testing for failures. You need to test that garbage-in is recognized as garbage, otherwise you get garbage-out too.

@endrift @joe that reminds me of this recent example i encountered where i came across a quote in someone's bio and wanted to know what it meant. google translate confidently decided it was italian and gave this response. i dont know italian but was pretty sure this looked nothing like italian so ended up looking harder and found that its a quote of a made up language from a song in a game.

if u turn off gemini then it more sensibly determines that it cant find a translation, but its on now by default

@endrift

This is a fundamental problem with the "solve everything" model.

Just from a high level perspective, how do you test "Everything" and then the follow-up question is how do you regression test "Everything".

Folks have been developing AI for decades and testing models meant to solve a narrow hard problem requires heroic amounts of work. This is basically impossible.

@endrift EXPUNGED (I'd posted a link to a private post)
@noodle link doesn't seem to work for me
@noodle @endrift The ID looks like it might be a post from 2025-01-17, but if it exists, it is not public.
@endrift I find they can generally be much better than previous approaches at translating "human" errors and nonstandard language, but yeah they have the standard LLM issue where if you feed in garbage they don't know how to say "no" and they just hallucinate whatever is statistically most likely.

@endrift Not that previous approaches did any better though. Google Translate has been doing stupid stuff since its inception and it has never had a "this doesn't make any sense" flag.

The main difference is LLMs are more likely to come up with grammatically correct, plausible sounding text instead of something clearly broken, when given clearly broken input.

@endrift I honestly don't know why they didn't have, like, a confidence flag in old models that could actually refuse to translate or warn.

"Standard" LLMs (the text completion kind), if that's what they're using now, probably can't implement that reliably, but you can probably architect a translation model that can (I forget what it's called but I think there's a model architecture better suited to translation).

@lina @endrift
I'd like it if machine translations added (?) markers and footnotes to explain areas of low confidence in the translation. That'd help the person using them to know they should look for more context, and deepen their understanding.

Answers without showing the working are just *much less useful*, in any field of study. Other machine translation methods could show their working. LLMs can't, there isn't really any.

@petealexharris @endrift LLMs do actually have the ability to show alternate terms/synonyms, that's like the one bit of data you can get out of them relatively easily.

But that doesn't really help with overall confidence in the translation, just identifying word alternatives.

@lina the real problem is people accepting that answer as though it's accurate. But then, I've seen pictures of stores named "Translation Error" or similar 25 years ago, so I guess that's not new.

@endrift Yeah it's kind of always been a thing. See also the endless posts on Reddit of people getting ridiculous Japanese/Chinese tattoos that don't make any sense ^^;;

Of all things I think better MT is important for society and something LLMs are actually arguably good at (though there's still plenty of room for improvement), so I'm reluctant to clown on them for this use case. Plus I'm pretty sure MT-optimized models don't need to be of the huge energy-guzzling variety. And translation, *in general*, doesn't have significant copyright concerns.

Of course, they in no way invalidate the need for human translators, but that market has already been going downhill for decades before LLMs because... people simply don't care, and primitive MT was already good enough for them not to care.

I think with LLMs it's important to focus on the ways in which they can be uniquely dangerous and wasteful. In just the sense of "they're a crappy machine replacement for something humans do better" they aren't unique and we've gone through such technologies before.

@lina @endrift
*syntactically correct

Yeah, never failing to give an answer destroys confidence that *any* apparently suitable translation came from what was in the source text. If an output can come from literally nowhere, how do we know when that's happening as part of translating real input? The algorithm is the same in both cases.

@petealexharris @endrift Edited (whoops, that could certainly be read wrong).

But yeah, machine translation has always been like this, so this is nothing new.

This does not make MT (LLM or not) useless, it's just not a crystal ball that works in every case. You need to either be okay with the error rate, or have enough familiarity with the source language to at least be able to catch glaring errors or sense when something is off (and obviously you should not use it to produce professional output without a professional translator in the loop).

@endrift I discovered this personally recently when I tried to translate the nonsense Chinese characters that the "bush hid the facts" bug produces and it came up with a perfectly sensible sentence
@endrift yuuup

the previous (2016+) google translate approach already handled grammar well, and
also didn't have this problem, at least not as much. If you feed it nonsense, it would usually just transliterate it or sometimes keep it as is

it's noticeably worse since they switched to an llm, now it will just happily output a wrong answer instead

"llms are good for translation" my ass
@alice @endrift i feel like llms are “good” at translation only because companies have thrown billions at those systems. if we spent as much on actual MT systems, we’d have better results, because they’re specialised systems that don’t try to do absolutely everything
@endrift I was positively surprised by how LLMs could translate text to and from interlingua, a language less popular with esperanto. The translations were mostly correct, but when I started arguing with the LLM about a mistake it made, I had to spend half an hour before having it to admit it had been wrong.
@endrift garbage in -> garbage out
@TheOneDoc @endrift except even a basic 3€ calculator will recognize when you enter something invalid and enter an error state instead of just tossing a random number at you. Software just making some garbage up, that isn't even meaningfully related to the garbage you entered, is a very recent invention and we should not just accept it.
@ratsnakegames @endrift yes I want my garbage generator to be at least deterministic.

@endrift this is the problem with LLMs for transcription too. They do ok, but they MAKE SHIT UP. A more specific transformer-based model does a great job! But the chatbot is more convenient.

I'm disconcerted they're putting the LLM "I know what you mean" thing into Google Translate, the original showcase for transformers. I mean, it's obvious they think that's helpful. But still, urgh.

@davidgerard @endrift

My favourite example was actually you! A post containing pivot-to-ai.com was translated (I can't remember the source language) and it decided to replace the domain name with 'pineapple.com'.

It wasn't allowed to touch the HTML, so this ended up with a link that showed 'pineapple.com' but went to 'pivot-to-ai.com'.

I strongly suspect that there are some neat ways of sneaking malicious links past existing email filters that rely on this.

@david_chisnall @endrift LLMs really are the sharp end of "convenience is king"
@david_chisnall @davidgerard @endrift No serious spam filter doesn’t already treat that sort of thing as suspect. Spammers have been trying it for >25 years. There’s a constant stream of new spammers trying it because they lack anything like technical lore. Each new idiot reinvents his own set of the same old stupid tricks.

@grumpybozo @davidgerard @endrift

Current spam filters notice when the link target doesn’t match the text. The attack I’m proposing is where they do match as they go through the filter, but then local translation makes the target the victim sees appear innocuous.

@davidgerard my doctor asked me if she could use AI summaries for the visit when I went to my last annual physical. I declined. I'd like the person writing summaries to actually be able to fact check it while writing it instead of finding errors once they've forgotten whether or not that actually happened
@endrift If you do any serious QC on its output, you violate the prime tenet of the LLM cultists.
@endrift
I must try this out ... working on fake English sentences ...
@endrift Fun recent example: Google casually "translating" name (edit: actually acronym) of a security service of one country to (non-existent) security service of another https://mastodon.social/@reedmideke/116292877233143239
@endrift
What's the full sentence in French ?
@gdupont Vos jeux au bout des doigts; it's written on the Steam Controller box. So it guessed correctly, but critically, it guessed. It didn't have enough information to know for sure.

@endrift
I see

I played with it a little bit trying to stop at different steps, revert, retry, change the end. The translation self-correct immediately in case the ending guess is wrong.

Could it be that the system has a very high confidence that 1) your sentence not finished yet, but you will finish it and 2) the possible ending are so close to what it expects... in that position, it's proactive.

Not saying it's "THE good" choice. More like there is a trade-off between fast and accurate.

@endrift premature prediction
@endrift It's almost like this technology is fundamentally an autocomplete.
@endrift context: google translate announced their switch to the PaLM2 LLM on 27 June 2024 (mentioning it because that’s later than people expect)