We used to have working spelling and grammar checkers. Why does everybody in tech pretend you need a whole-ass LLM to check for typos?
@baldur
And translations
And text to speech was working good in most cases
@wikiyu @baldur no offense, but LLMs are really really good at translations, compared to the state of the art before. (and e.g. Google Translate was a lot more LLM-style AI for years than people think)
@wikiyu @baldur (I'd argue that's probably the thing their design lends itself to rather well – analyzing which tokens in which context. Certainly will never reach human translator qualities, but saying "machine translation was good before", um, no, it really really wasn't.)
@funkylab @wikiyu I'm Icelandic and I know a bit of Danish and French, and I can tell you right now that for the languages I'm familiar with, LLM translators are worse, less accurate, and very prone to fabricate nonsense, than the non-LLMs they are replacing. Maybe they're great for other languages but they're horrible for the ones I know.

@baldur @wikiyu Color me really surprised; paragon and derivatives (at least for German) were definitely worse; I remember the (2008-ish?) surprise when ANN-based translations started to achieve higher rankings than purely Bayes-based/statistical methods.

(Oh and I do personally remember things like the Babylon spyware thing, which wasn't really good. IBM Watson didn't work as well as Google translate when that came out, for German<->English at least. I had played with Aperium in its earlier …

@baldur @wikiyu days, wonder how it does these days. I imagine that's the technology you've got in mind when you think of non-LLM translation?)

@funkylab @wikiyu So, around the time the LLM bubble first began, there was a noticeable sharp decline in the performance of publicly available translation services (i.e. Google Translate and the like) when it came to translating most Nordic languages and it's generally gotten worse, not better, over time. It's become a running joke.

An important note here is that there is much much less text available for these languages in machine-readable form than even German or French.

@baldur @funkylab @wikiyu The trouble is that so much language is as much about what you don't say, and the words you don't use, as what you do. And LLMs are very bad at spotting sarcasm, innuendo, slang, dialect, and specific turns of phrase. For example, there is a world of difference in (American) English between a butt dial and a booty call. Even as an Englishman, I know that.

@UkeleleEric @baldur @wikiyu don't know whether that's a good example, because the difference is clear even devoid of context, PLUS existing LLMs have no problem with that difference at all. (The two phrases are only similar to the human reader. You're projecting things that are easy to make mistakes on for humans to machine translation! (see attached Deep-L)

I'm also not sure rule based & Bayesian translation makes a lot of difference when it comes to sarcasm. That's sentiment detection!

@[email protected] They are not. This is a commonly-held view that, unfortunately, is ultimately chauvinistic and does not hold up to scrutiny. These Google-style translators might have achieved state of the art performance on benchmarks translating between English and other dominant Latinate languages, but outside of that they are fairly poor. Furthermore, LLM use gets in the way of learning the detailed linguistic features that would allow someone to design a significantly more performant--in all senses of that word--non-LLM translator that would be of general use. So LLM-based translators are poor in this respect as well. @[email protected] @[email protected]

@abucci @wikiyu @baldur I feel like we're arguing based on perceptions here – I certainly am, and can but vaguely remember the press echo when neural (not LLM) translators came out. So, I might need to shut up here and say: Have not enough data to base my claims here. Do you?

Do we have any qualitative analysis in literature that I could read? So far we've got four people claiming things, that's not a great discussion :)

@abucci @wikiyu @baldur (btw, not amazed by being called chauvinistic for doubting that the generalized claim that older translations were better. But I assume you mean well.)
@[email protected] I referred to a particular viewpoint as chauvinistic, not you personally nor any other person. I went to pains to suggest this is a widespread view, which again has the particular consequence that I am not pinning it to you personally. If I meant to call you a chauvinist I would have said "you are a chauvinist".

Furthermore, it is a significant mischaracterization of my post to say I was claiming "older translators are better". This is not what I said; nor is it an implication of what I said. I stated that LLM-based translators have shortcomings. None of the shortcomings I pointed out or gestured towards are particularly controversial, and have been written about many times. Making such statements is a basic part of any engineering practice: in order to select the right tools for a particular job, one has to take an honest look at the tradeoffs involved.

I assume you mean well also, and so I will share with you that to my ear both your responses sounded like they were written in bad faith. If that was not your intention, be aware that your actual intentions are not coming through when you post like this, at least not to me.

@[email protected] @[email protected]
@[email protected] I am not going to do your homework for you on Mastodon. I do teach computer science classes for pay from time to time and would be happy to consider helping you in that capacity. A fair-minded (i.e., not biased towards supporting one's prior assumptions) scan through the Association for Computational Linguistics publications isn't a bad place to start. @[email protected] @[email protected]
@funkylab @wikiyu @baldur People forget that the transformer model that LLMs use was created first for translations, so, of course, LLMs should be good at it as much as other solutions, they are based on the same building blocks.

@qgustavor @wikiyu That's what I'd argue, too, but: this very basic theory and reality, especially of really available implementations, might diverge there.
Thing is that @baldur is actually someone from the field, so his word does weigh heavy to me, even if it doesn't reflect my own experience with translation quality.

(EDIT: way->weigh. Human in-mind translations are not perfect, either :D)

@funkylab @qgustavor @wikiyu

So, AFAICT and as best I know, in general LLMs are sensitive to the size of the training data set. Only a few languages have a collection of machine-readable texts big enough for these models

IIRC they used to compensate for this in the pre-LLM days specifically for each language.

@funkylab @qgustavor @wikiyu

Once everybody began to migrate to approaches that require large data sets, performance for all of those tasks (translation, summary, correction) in smaller languages especially began to suffer

Though, it should be noted that in a lot of third party, neutral testing, specialised models outperform LLMs for many language tasks such as summarisation, even in English. At least in the same ballpark, even if they underperform, while costing orders of magnitudes less

@baldur @qgustavor @wikiyu can fully see how this is a problem esp. for Icelandic, but certainly not for French (probably ~ third or fourth most-written indogermanic language in available literature sets)!
@funkylab @qgustavor @wikiyu Yeah, I can't explain why the French translations are so often garbage when I use these tools.
@baldur @qgustavor @wikiyu that's really interesting!
By the way, as I said, I hadn't been using dedicated tools since ~2010, and then only when language model (not necessary what is usually qualified as "L"LM) translation became "commoditized", esp. when it became available as addon to Firefox.
If that level of translation was possible before at similar or lower effort, I'm kind of disappointed with the world with not shipping it with browsers earlier; I maintain the opinion that translation …
@baldur @qgustavor @wikiyu … is an essential assistive technology to most humans, who don't get as much out of the very English-centric scientific technical, cultural and geopolitical parts of the internet.
@baldur aside from the most simple dictionary based spell checkers, I also do not undestand why we're pushing so hard to replace all algorithms and machine learning tools we have with LLMs. Like, we have AI tools that served us for many years before this recent AI craze and they're being replaced with more marketable model that doesn't fit those usecases as well
@Xtrems876 @baldur
A common theme for me over the last few months is people using this stuff for document ingestion, indexing, and basic NLP/manipulation, then discovering that the output is bullshit. There are good, validated approaches for doing this stuff that are much cheaper and more deterministic in their behaviour. It's honestly very weird.
@baldur Because it bumps up the valuation of any grammar checker company 100x.

@baldur A friend told me he uses it to do tasks that are easy to validate, like renaming variables. I was like... did you previously not have ways to rename variables? Is this not something that you've done a million times before?

He mumbled some excuses about how he couldn't get some LSP server to work or something. This man has close to 20 years of professional experience, and chatbots have absolutely ruined his brain.

@AndrewRadev I have a bit more sympathy for this type of "instructional" use case. Depending on your language and editor, it can be a right pain to get tooling to work well. I still struggle with Neovim config and getting things working!

I wanted to move an Elixir module yesterday and thought I'd try it with Gemini to save me some drudge work. It worked, found references I probably would have missed on first go. Though not sure it saved me any time in the end with all the "thinking" required.

As much as I dislike these things, it would be non-factual to deny they have some utility.

@Odaeus Of course they have some utility compared to doing nothing. What is their utility compared to other things? What is the utility of a non-deterministic text generator compared to a deterministic compiler or LSP server? What is the *cost* of doing this if it wasn't subsidized by astronomical losses by these companies?

@AndrewRadev I was comparing its utility to the work involved in learning or choosing an editor/IDE and having it perform the same task... self-evidently not comparing it against doing "nothing" or a compiler..! And how from the perspective of a user/developer, there is benefit there. And sadly, it is reliable for many such tasks.

I do not need to be lectured about the numerous clear externalised costs.

@Odaeus You mention moving a module in Elixir. I don't have professional experience with Elixir. In my last job, I wrote React with typescript.

I always had `tsc --watch` running in a terminal window. When I needed to move some code, refactor components, etc, I would make the change, then follow the tsc errors one by one. This was a fairly straightforward and 100% reliable process. Tsc had its issues, but it was deterministic.

This is why I compare an LLM "move this module" task to a compiler. I guess maybe you can ask the LLM to move the module and also run the compiler? But in either case, you won't miss anything. Maybe it'll fail and you'll have to spend the same time fixing issues than you would have doing things the direct way. Maybe it will succeed, but it'll introduce unrelated changes that compile, but introduce issues. I don't see how you could ever know what you're going to get and why this would be a desirable workflow for a professional software developer. I would always prefer a consistent, reliable workflow, rather than a roll of the dice.

The METR study had people estimate they were faster by 20% using AI, but they were 19% slower. Their estimations were completely off, because it's impossible to reliably know whether you would have, in fact, missed that one reference. I agree that there is *perceived* utility compared to the alternative. I agree there *might* be real utility *sometimes*. I don't believe that most developers have actually designed experiments where they've measured whether it's beneficial for them on average, or not.

@AndrewRadev @baldur buy my new app which is an LLM that does find and replace 🤪

@scott @baldur Buy my new app, I call it a "search engine". Benefits over searching for information via LLMs:

- Much faster
- No need to write full sentences, just type keywords
- 100% Hallucination-free, all results are guaranteed to point to the actual sites the results were taken from.

@AndrewRadev @baldur this sounds so radical, it couldn’t possibly work! I mean it would still need a data center but it couldn’t possible use enough water and electricity to be… oh yeah never mind, good idea
@baldur rich guys spent trillions saying it's the next big thing, so it has to be good for something (or else the ideology of rich guys not being incompetent becomes suspect)

@Dangerous_beans @baldur yo grammar checkers aint got nothing on me:-)

video: syntax - pride (music) - word play
https://www.youtube.com/watch?v=HkpXGMPwf2c

Pride - Syntax

YouTube
@baldur And in many cases, those new systems are really doing a lousy job
@baldur If you've got a big enough hammer..
@baldur
We still do.
LO Writer & plug ins
Over 30 novels written.
@baldur Their are lots of errors spell checkers don't catch.

@lemgandi @baldur

Even full-on grammar checkers and reading level metrics do not require world-burning LLMs.

@lemgandi Twenty years ago I worked for a company that sold a full-featured grammar and spelling checker for Icelandic, a considerably more troublesome language than English, and the overall tech has only improved since. This is a thoroughly researched problem and not a task that should require a large language model.
@baldur there are so many things people use LLMs for that it would make sense to build tools for after the first few usages (maybe the LLM could be used to code the tool). Especially since ordinary tools use energe so much more efficiently.
@64kb That was literally one of the suggestions I made three years ago. Basically, even if you do disagree with the issues with LLMs that people are pointing out, the cost alone means it makes economic sense to switch to specialised, low-cost models for the specific use cases you discover, because not only are they cheaper, they tend to be more effective.
@baldur that's possible too. What I really meant was get the LLM to code a traditional tool and then use that subsequently.
@baldur As a production editor who has spent some time proofreading and copyediting, I am not entirely sure we've ever had working spelling and grammar checkers. I've never lacked for work.
@maccruiskeen The LLMs aren't an improvement, though.

@baldur Oh, definitely not. The old checkers would make mistakes and miss stuff, but they didn't just make shit up. At work we've started using a non-AI Word plugin called PerfectIt that can actually check against Chicago Manual of Style rules, and it's useful, but it has its limits. I don't expect to be out of a job any time too soon.

eta: We've also started playing with an AI-based system to provide alt-text for remediating backlist ebooks. It's very hit or miss. It kind of highlights that the main limitation of any automated system is failure to understand context.

@baldur gives more weight to their insistence that you need an LLM (generally)
@baldur A solution desperately looking for a problem.

@baldur From my own experience working with the classic spelling and grammar checkers (I open-sourced it even, subtitle-linter on my GitHub) you are overselling them.

Once I even got into an internet fight because I said spelling checkers should reject rare words, as, in my experience, most instances of rare words in the texts I reviewed were typos — those were Portuguese texts, but if I were to give an English example it would be like finding lots of "fain" as typos for "pain" — in Portuguese there were LOTS of "maça" (mace) for "maçã" (apple).

But, to be fair, even LLMs have issues with this due to how they are designed. On the other hand, it's more likely a LLM would recognise mace if it's being held by a king, but wrong if someone is eating it. And all of this is just spelling!

@baldur Spell/grammar checkers now routinely harass me to change something completely correct into something incorrect. This timeline is glorious.
@baldur IMHO LLMs are doing a worse job of it
@baldur So many of us being called out here 😂 and you’re not wrong. This is a wildly inefficient use of tokens, compute and all of the things that come with it.
@baldur we also had very good dictionaries. And I know people who simply wouldn't use them. No idea why. Using AI, however? No problem at all. Totally beats me.