Mastodawn

We used to have working spelling and grammar checkers. Why does everybody in tech pretend you need a whole-ass LLM to check for typos?

Show thread

wikiyu Feb 20

@baldur
And translations
And text to speech was working good in most cases

Show thread

Marcus Müller

@wikiyu @baldur no offense, but LLMs are really really good at translations, compared to the state of the art before. (and e.g. Google Translate was a lot more LLM-style AI for years than people think)

Show thread

Marcus Müller Feb 20

@wikiyu @baldur (I'd argue that's probably the thing their design lends itself to rather well – analyzing which tokens in which context. Certainly will never reach human translator qualities, but saying "machine translation was good before", um, no, it really really wasn't.)

Show thread

Baldur Bjarnason Feb 20

@funkylab @wikiyu I'm Icelandic and I know a bit of Danish and French, and I can tell you right now that for the languages I'm familiar with, LLM translators are worse, less accurate, and very prone to fabricate nonsense, than the non-LLMs they are replacing. Maybe they're great for other languages but they're horrible for the ones I know.

Show thread

Marcus Müller Feb 20

@baldur @wikiyu Color me really surprised; paragon and derivatives (at least for German) were definitely worse; I remember the (2008-ish?) surprise when ANN-based translations started to achieve higher rankings than purely Bayes-based/statistical methods.

(Oh and I do personally remember things like the Babylon spyware thing, which wasn't really good. IBM Watson didn't work as well as Google translate when that came out, for German<->English at least. I had played with Aperium in its earlier …

Show thread

Marcus Müller Feb 20

@baldur @wikiyu days, wonder how it does these days. I imagine that's the technology you've got in mind when you think of non-LLM translation?)

Show thread

Baldur Bjarnason Feb 20

@funkylab @wikiyu So, around the time the LLM bubble first began, there was a noticeable sharp decline in the performance of publicly available translation services (i.e. Google Translate and the like) when it came to translating most Nordic languages and it's generally gotten worse, not better, over time. It's become a running joke.

An important note here is that there is much much less text available for these languages in machine-readable form than even German or French.

Show thread

UkeleleEric Feb 20

@baldur @funkylab @wikiyu The trouble is that so much language is as much about what you don't say, and the words you don't use, as what you do. And LLMs are very bad at spotting sarcasm, innuendo, slang, dialect, and specific turns of phrase. For example, there is a world of difference in (American) English between a butt dial and a booty call. Even as an Englishman, I know that.

Show thread

Marcus Müller Feb 20

@UkeleleEric @baldur @wikiyu don't know whether that's a good example, because the difference is clear even devoid of context, PLUS existing LLMs have no problem with that difference at all. (The two phrases are only similar to the human reader. You're projecting things that are easy to make mistakes on for humans to machine translation! (see attached Deep-L)

I'm also not sure rule based & Bayesian translation makes a lot of difference when it comes to sarcasm. That's sentiment detection!

Show thread

Anthony Feb 20

@[email protected] They are not. This is a commonly-held view that, unfortunately, is ultimately chauvinistic and does not hold up to scrutiny. These Google-style translators might have achieved state of the art performance on benchmarks translating between English and other dominant Latinate languages, but outside of that they are fairly poor. Furthermore, LLM use gets in the way of learning the detailed linguistic features that would allow someone to design a significantly more performant--in all senses of that word--non-LLM translator that would be of general use. So LLM-based translators are poor in this respect as well. @[email protected] @[email protected]

Show thread

Marcus Müller Feb 20

@abucci @wikiyu @baldur I feel like we're arguing based on perceptions here – I certainly am, and can but vaguely remember the press echo when neural (not LLM) translators came out. So, I might need to shut up here and say: Have not enough data to base my claims here. Do you?

Do we have any qualitative analysis in literature that I could read? So far we've got four people claiming things, that's not a great discussion :)

Show thread

Marcus Müller Feb 20

@abucci @wikiyu @baldur (btw, not amazed by being called chauvinistic for doubting that the generalized claim that older translations were better. But I assume you mean well.)

Show thread

Anthony Feb 20

@[email protected] I referred to a particular viewpoint as chauvinistic, not you personally nor any other person. I went to pains to suggest this is a widespread view, which again has the particular consequence that I am not pinning it to you personally. If I meant to call you a chauvinist I would have said "you are a chauvinist".

Furthermore, it is a significant mischaracterization of my post to say I was claiming "older translators are better". This is not what I said; nor is it an implication of what I said. I stated that LLM-based translators have shortcomings. None of the shortcomings I pointed out or gestured towards are particularly controversial, and have been written about many times. Making such statements is a basic part of any engineering practice: in order to select the right tools for a particular job, one has to take an honest look at the tradeoffs involved.

I assume you mean well also, and so I will share with you that to my ear both your responses sounded like they were written in bad faith. If that was not your intention, be aware that your actual intentions are not coming through when you post like this, at least not to me.

@[email protected] @[email protected]

Show thread

Anthony Feb 20

@[email protected] I am not going to do your homework for you on Mastodon. I do teach computer science classes for pay from time to time and would be happy to consider helping you in that capacity. A fair-minded (i.e., not biased towards supporting one's prior assumptions) scan through the Association for Computational Linguistics publications isn't a bad place to start. @[email protected] @[email protected]

Show thread

Gustavo Feb 20

@funkylab @wikiyu @baldur People forget that the transformer model that LLMs use was created first for translations, so, of course, LLMs should be good at it as much as other solutions, they are based on the same building blocks.

Show thread

Marcus Müller Feb 20

@qgustavor @wikiyu That's what I'd argue, too, but: this very basic theory and reality, especially of really available implementations, might diverge there.
Thing is that @baldur is actually someone from the field, so his word does weigh heavy to me, even if it doesn't reflect my own experience with translation quality.

(EDIT: way->weigh. Human in-mind translations are not perfect, either :D)

Show thread

Baldur Bjarnason Feb 20

@funkylab @qgustavor @wikiyu

So, AFAICT and as best I know, in general LLMs are sensitive to the size of the training data set. Only a few languages have a collection of machine-readable texts big enough for these models

IIRC they used to compensate for this in the pre-LLM days specifically for each language.

Show thread

Baldur Bjarnason Feb 20

@funkylab @qgustavor @wikiyu

Once everybody began to migrate to approaches that require large data sets, performance for all of those tasks (translation, summary, correction) in smaller languages especially began to suffer

Though, it should be noted that in a lot of third party, neutral testing, specialised models outperform LLMs for many language tasks such as summarisation, even in English. At least in the same ballpark, even if they underperform, while costing orders of magnitudes less

Show thread

Marcus Müller Feb 20

@baldur @qgustavor @wikiyu can fully see how this is a problem esp. for Icelandic, but certainly not for French (probably ~ third or fourth most-written indogermanic language in available literature sets)!

Show thread

Baldur Bjarnason Feb 20

@funkylab @qgustavor @wikiyu Yeah, I can't explain why the French translations are so often garbage when I use these tools.

Show thread

Marcus Müller Feb 20

@baldur @qgustavor @wikiyu that's really interesting!
By the way, as I said, I hadn't been using dedicated tools since ~2010, and then only when language model (not necessary what is usually qualified as "L"LM) translation became "commoditized", esp. when it became available as addon to Firefox.
If that level of translation was possible before at similar or lower effort, I'm kind of disappointed with the world with not shipping it with browsers earlier; I maintain the opinion that translation …

Show thread

Marcus Müller Feb 20

@baldur @qgustavor @wikiyu … is an essential assistive technology to most humans, who don't get as much out of the very English-centric scientific technical, cultural and geopolitical parts of the internet.