Mastodawn

We used to have working spelling and grammar checkers. Why does everybody in tech pretend you need a whole-ass LLM to check for typos?

Show thread

wikiyu Feb 20

@baldur
And translations
And text to speech was working good in most cases

Show thread

Marcus Müller Feb 20

@wikiyu @baldur no offense, but LLMs are really really good at translations, compared to the state of the art before. (and e.g. Google Translate was a lot more LLM-style AI for years than people think)

Show thread

Gustavo Feb 20

@funkylab @wikiyu @baldur People forget that the transformer model that LLMs use was created first for translations, so, of course, LLMs should be good at it as much as other solutions, they are based on the same building blocks.

Show thread

Marcus Müller

@qgustavor @wikiyu That's what I'd argue, too, but: this very basic theory and reality, especially of really available implementations, might diverge there.
Thing is that @baldur is actually someone from the field, so his word does weigh heavy to me, even if it doesn't reflect my own experience with translation quality.

(EDIT: way->weigh. Human in-mind translations are not perfect, either :D)

Show thread

Baldur Bjarnason Feb 20

@funkylab @qgustavor @wikiyu

So, AFAICT and as best I know, in general LLMs are sensitive to the size of the training data set. Only a few languages have a collection of machine-readable texts big enough for these models

IIRC they used to compensate for this in the pre-LLM days specifically for each language.

Show thread

Baldur Bjarnason Feb 20

@funkylab @qgustavor @wikiyu

Once everybody began to migrate to approaches that require large data sets, performance for all of those tasks (translation, summary, correction) in smaller languages especially began to suffer

Though, it should be noted that in a lot of third party, neutral testing, specialised models outperform LLMs for many language tasks such as summarisation, even in English. At least in the same ballpark, even if they underperform, while costing orders of magnitudes less

Show thread

Marcus Müller Feb 20

@baldur @qgustavor @wikiyu can fully see how this is a problem esp. for Icelandic, but certainly not for French (probably ~ third or fourth most-written indogermanic language in available literature sets)!

Show thread

Baldur Bjarnason Feb 20

@funkylab @qgustavor @wikiyu Yeah, I can't explain why the French translations are so often garbage when I use these tools.

Show thread

Marcus Müller Feb 20

@baldur @qgustavor @wikiyu that's really interesting!
By the way, as I said, I hadn't been using dedicated tools since ~2010, and then only when language model (not necessary what is usually qualified as "L"LM) translation became "commoditized", esp. when it became available as addon to Firefox.
If that level of translation was possible before at similar or lower effort, I'm kind of disappointed with the world with not shipping it with browsers earlier; I maintain the opinion that translation …

Show thread

Marcus Müller Feb 20

@baldur @qgustavor @wikiyu … is an essential assistive technology to most humans, who don't get as much out of the very English-centric scientific technical, cultural and geopolitical parts of the internet.