Mastodawn

David Chisnall (*Now with 50% more sarcasm!*)Jan 1, 2025

A lot of the current hype around LLMs revolves around one core idea, which I blame on Star Trek:

Wouldn't it be cool if we could use natural language to control things?

The problem is that this is, at the fundamental level, a terrible idea.

There's a reason that mathematics doesn't use English. There's a reason that every professional field comes with its own flavour of jargon. There's a reason that contracts are written in legalese, not plain natural language. Natural language is really bad at being unambiguous.

When I was a small child, I thought that a mature civilisation would evolve two languages. A language of poetry, that was rich in metaphor and delighted in ambiguity, and a language of science that required more detail and actively avoided ambiguity. The latter would have no homophones, no homonyms, unambiguous grammar, and so on.

Programming languages, including the ad-hoc programming languages that we refer to as 'user interfaces' are all attempts to build languages like the latter. They allow the user to unambiguously express intent so that it can be carried out. Natural languages are not designed and end up being examples of the former.

When I interact with a tool, I want it to do what I tell it. If I am willing to restrict my use of natural language to a clear and unambiguous subset, I have defined a language that is easy for deterministic parsers to understand with a fraction of the energy requirement of a language model. If I am not, then I am expressing myself ambiguously and no amount of processing can possibly remove the ambiguity that is intrinsic in the source, except a complete, fully synchronised, model of my own mind that knows what I meant (and not what some other person saying the same thing at the same time might have meant).

The hard part of programming is not writing things in some language's syntax, it's expressing the problem in a way that lacks ambiguity. LLMs don't help here, they pick an arbitrary, nondeterministic, option for the ambiguous cases. In C, compilers do this for undefined behaviour and it is widely regarded as a disaster. LLMs are built entirely out of undefined behaviour.

There are use cases where getting it wrong is fine. Choosing a radio station or album to listen to while driving, for example. It is far better to sometimes listen to the wrong thing than to take your attention away from the road and interact with a richer UI for ten seconds. In situations where your hands are unavailable (for example, controlling non-critical equipment while performing surgery, or cooking), a natural-language interface is better than no interface. It's rarely, if ever, the best.

Show thread

Erik Jonker Jan 1, 2025

@david_chisnall ...that's also one of it's strengths, language is a completely different "beast" then math. Comparing it is useless. Language fulfills different functions then math. But just as important for human beings.

Show thread

Resuna Jan 1, 2025

@ErikJonker @david_chisnall

That's the point. Giving directions to a machine requires math. Using the term "language" to refer to machine instructions has led people down the wrong path over and over again and led to monstrosities like COBOL and Perl and "Congratulations, you have decided to clean the elevator!"

Show thread

Erik Jonker Jan 1, 2025

@resuna @david_chisnall cobol is not a monstrosity as a programming language, it’s ofcourse legacy

Show thread

Resuna Jan 1, 2025

@ErikJonker @david_chisnall

My first job was programming in COBOL before it was legacy. It is terrible. It always was terrible. It's not a natural language, it's not ambiguous, but trying to make it look like a natural language was an unmitigated disaster at every level. The same is true of Perl's "linguistic" design. Even just pretending to be a natural language spawns monstrosities.

Edit: see also Applescript.

Show thread

Erik Jonker

@resuna @david_chisnall it is extremely stable and durable for sure, ask any financial institution 😃

Show thread

jhannafin Jan 1, 2025

@ErikJonker @resuna @david_chisnall OK, but that's not because of COBOL. You could write something durable and stable in any programming language. Financial software is written in COBOL because that was the language of the mainframe at the time. The fact that it's still largely in COBOL is because it's expensive to rewrite, the returns on a rewrite are hard to quantify, and the risks are huge.

Show thread

Resuna Jan 1, 2025

@ErikJonker @david_chisnall

Have you ever written any code in COBOL? Everything in COBOL takes longer to write, the fundamental operations are simplistic and verbose, the program structure is stilted and restrictive, the way you define data structures is horribly antiquated, and a huge number of the problems that make writing COBOL so slow and painful are due to its mistaken "language like" design.