LLM translation models are going great
I think people really underestimate how fragile LLMs are for auto-translation. You can put complete garbage into it where none of the words are real words and still get out plausible-sounding "translations" just because the LLM sees it as "close" to a real sentence, and then translates what it thinks that "close" sentence is based, once again, on what seems "close".
The whole benchmarking approach really does not help with this since benchmarks rarely include testing for failures. You need to test that garbage-in is recognized as garbage, otherwise you get garbage-out too.