I’m glad people are finally noticing LLM translators will just make plausible sentences the fuck up when they’re fed anything but a perfect source to translate, which makes them exhausting and damaging for language learning and a variety of other situations where you’re expected to actually, you know, use the fucking language for anything but a highly inaccurate skim

thank fuck all the language learning companies didn’t jump onto the LLM train, right?

…right?

@zzt Machine translation has never been good enough for prod. It was only for personal use. I use websites in English because the Chinese translations are usually awkward and baffling. What blows my mind is that LLM translations are worse, but now companies are bragging about using it. They didn't brag about using machine translation because that was embarrassing.

@robinsyl @zzt

Most "freely" available machine translation engines have never been good enough, but purpose-built language pair engines (like say DeepL's "classic" backend, RWS, Lionbridge) have been "good enough" for years (well before this current LLM craze).

Yes, I wouldn't trust ChatGPT to translate one of my technical documents into Chinese, that would be a bunch of gibberish. But the good engines are the ones you still pay comparable-to-humans $/word for.

1/

@robinsyl @zzt

And we DO check, IF we have the luxury of $ and time for a human proofreader.
If we ARE lucky enough to get downstream review in the target language we sometimes do a blind test; translate the same thing with the machine and a human; frequently the two are similar level of "not perfect, but good enough; technically correct, conveys the right information." They just make different mistakes.
2/

@robinsyl @zzt

The Machine tends to always fall back on a literal translation so it fumbles on colloquialisms and commonly understood phrases; also if we're using dictionaries or translation glossaries to overrule default literal translations, it tends to mangle the surrounding grammar, like tenses or genders. A human wouldn't do that, but in particularly technical content we have to explain to the humans what a whidgaflam is.
3/

@robinsyl @zzt

And we may not always get the same human next time so we have to explain _again_ (this too can be assuaged by glossaries, but those come with their own problems.)

But yeah, fundamentally whether you use cheap LLMs, GOOD LLMs or humans, if you're not proofreading with a target langauge specialist in your field you're gonna get questionable results no matter what.

4/

@robinsyl @zzt

Sometimes we don't care - if a customer's country has language laws saying you MUST have a manual in French, but everyone knows the operators all know english, maybe we'll skip the proofreaders because noone is gonna ever read that manual. But manual_fr-FR.pdf checks a box.
IF we're trying to land our first customer in Japanese and its a big system sale, you bet your ass we're having a proofreader review all 400 pages.
5/

@robinsyl @zzt

one thing MT is really good at, is speed. I can rinse 20 new/changed pages through MT and back into my CMS in an hour. With humans I gotta get a quote, cut an invoice, another PM schedules me in two weeks from now.. ugh.

In anycase, if any of these learn-to-speak services have switched to use MT instead of human curated content, ugh, that is gonna suck. The "classic" MTs were pretty good, but you get what you pay for $$. The "new" LLM MTs: probably a bunch of hot garbage.