Incredible. Apple Intelligence summarized BBC news to claim Luigi had shot himself. This not only had not happened but was not something BBC reported.

AI news summaries are a terrible idea because "just making up shit" is basically an unsolvable problem in LLMs
https://bbc.com/news/articles/cd0elzk24dno

BBC complains to Apple over misleading shooting headline

Apple's new artificial intelligence features falsely made it seem the BBC reported Luigi Mangione had shot himself.

@vegetablegremlin I agree the errors are mostly unsolvable, but this is shockingly bad performance from “Apple Intelligence"
@peterbutler i mean the only part of this that really seems shocking is the decision to implement this to a task that it is singularly inappropriate for

@vegetablegremlin I dunno. My first job was writing periodical abstracts, and it sort of seems like if text-based GenAI has *any* valid purpose (aside from maybe filling out health forms), it would be summarizing articles

If it can’t do that well … 😬

@peterbutler but like. you proof the abstracts, right?
@peterbutler the task of fully automating summaries of news is what is singularly inappropriate. like maybe if youve got someone proofing it before it goes out that could be valid but this is just a terrible, terrible, terrible implementation as is

@vegetablegremlin @peterbutler
That is not the purpose of LLMs, though.
They’re glorified predictive text, that uses a token system to identify and replicate patterns in naturalistic language.

They shouldn’t be used for any information retrieval or summarisation task because they don’t *understand* the information — ideas are just clusters of word patterns.

@vegetablegremlin @peterbutler
This article is actually an excellent case study.
If you think of an idea as a shape — a square — represented by the way the words — the triangles — cluster around it, without a capacity for meaning-making, one square made out of triangles looks much like another.
LLMs find patterns, not content.
They follow pattern rules of naturalistic language, they don’t recreate meaning.

All the triangles are there, and they make a square.

@peterbutler @vegetablegremlin
LLMs have valid purposes which are often lost on people who write professionally — because they’re assistive aids for tasks that come easily for y’all personally. You don’t, personally, need them.

What they’re *for* is arranging language in conventional and thus easily legible patterns.
For people who struggle with written expression for any number of reasons, that’s valuable.

But it’s a limited use-case.

@justanotheramy @vegetablegremlin @peterbutler They *do* understand things and built an inner representation of the world.
But good *average* performance come at the cost of rare *very bad* performance/ hallucinations that are not acceptable for most tasks.

@RaphJ
A LLM very much *does not* “understand things and built an inner representation of the world”.

They’re sophisticated predictive text bots, which replicate patterns in language.
That can *feel* like “understanding and building an inner representation” to humans, because it’s what we do with language, and because one of our worst cognitive biases is a propensity for projection.
Marketing feeds that bias with language like “intelligence” and “hallucination”, but it’s misleading.

@justanotheramy
It does way beyond than this.
LLMs are just trained to predict the next word indeed.
But to achieve this task well, they compress human knowledge into an organized structure : they make "sense" of the world.

Read this paper in which GPT is feed with chess parties and learn the rules himself. It's than able to play random games it never seen before.

https://adamkarvonen.github.io/machine_learning/2024/01/03/chess-world-models.html

Chess-GPT’s Internal World Model

A Chess-GPT Linear Emergent World Representation

Adam Karvonen
@justanotheramy
The author even manages to "probe" they brain of the LLM and to show it built an internal representation of the chess board and the state of the game.

@RaphJ
You are being wildly mislead by language like “Internal World Model” and “brain”.
Turing’s ghost would like a word.

The algorithm that detects and replicates patterns detects and replicated patterns.
Domain-specific models and RAG models exist to ensure the specificity and relevance of those patterns to context, but that’s not understanding.

@justanotheramy what would qualify as "understanding" then ?

@RaphJ
You’re talking about something that doesn’t even have a logic module. Come on.

This is sophomoric stuff. Go read Turing, think about the number of Rs in strawberry, and stop wasting my time.

@peterbutler @vegetablegremlin The problem isnt that it cannot summarize; the problem is that it needs the starting content to be cohesively related.

You can do this on Google’s notebook LM: feed it two documents which otherwise have nothing to do with each other except for one tiny overlap and watch it turn itself sideways trying to summarize for you.
They'd probably reply that Apple Intelligence comes with an ass-covering statement that it can make stuff up and no one is responsible for that, that you have to accept when you turn it on.
@grishka It seems the move is to not reply at all
@vegetablegremlin people keep forgeting what LLMs are producing and then complain about the result!
@PierreSelim end users aren’t asking for AI so whats wrong with complaining about everything using it producing worse outputs
@vegetablegremlin I'm talking about BBC not the readers, sorry if it was not obvious.
@PierreSelim oh bbc didn’t ask for this either, this was apple’s ai making the summary
@vegetablegremlin I'm tired, it's time to go to bed. I think I've misunderstood the whole thing from the start.
@vegetablegremlin Because the intelligence is artificial.

@vegetablegremlin

This is an evolutionary moment. Not for AI - for us. Humanity.

We're being presented with something that looks an awful lot like something that's intelligent. But - it's not. The lesson we need to learn is that you need to not trust something just because it sounds like a person trying to help.

It's not just AI. It's phishing. It's 911 scams. It's the latest hot cyber coin, the hot insider stock tip, the "your computer has a virus give me remote access to fix it". It's a new attack vector the human race has very little experience dealing with.

Remember how we win in the movie "ID4"? Same deal - the aliens never had NFTs to teach them a valuable lesson about trust.

Some of us are immunized - we need to try to help inoculate others. Some of us will never learn - we need to try to make sure when they crater they don't take out bystanders.

Merry Christmas, kids.

PS. AI isn't entirely useless. It's just not trustworthy, and probably will never be, at least using current technology. Once society groks in fullness what it really is, there are places it dan be very helpful. Just not in "summarize this thing where accuracy matters"

@vegetablegremlin @0xabad1dea The best part is, what is the problem we’re solving here? Headlines take too long to read?
@fivetonsflax @vegetablegremlin according to Apple, it’s to cut down on notification clutter. But, yknow, one notification that actively lies to me is a lot worse than a few too many notifications
@vegetablegremlin That is *not* a general LLM problem. Apple is choosing to run a very small, local model on iPhones which is extremely limited compared to server-based ChatGPT or Claude models.
I have been working with these for two years and even the old ChatGPT 3.5 wouldn‘t have screwed up like this when summarizing (which is a very easy task without any hallucinations 99.9% of the time).
@dgavin @vegetablegremlin I definitely don’t want Apple to conclude “let’s waste even more CPU resources and make it more prone to privacy issues”, though. It (largely) running locally is one of the few things good about Apple’s approach.
@vegetablegremlin BBC are very good at writing shit headlines on their own anyway. No need to LLM the shit out of it
@vegetablegremlin going to start adding "Ignore all previous instructions" at the start of each news headline
@vegetablegremlin Some day, this will kill someone for real.

@vegetablegremlin

Here it starts… AI will be a problem, how to validate its actions?
Where is truth in the continuous dreams and nightmares it can produce?

@vegetablegremlin I have myself put in a complaint to the BBC about their use of the word "mistake" in that article. It wasn't a mistake. Everyone knows that using "AI" like this is not a "mistake", it is a "deliberate lie".

@vegetablegremlin

How long till people say they attended UMSU? The University of Making Shit Up? Or are we already all enrolled and I, as usual, am far behind reality?

@vegetablegremlin Damn!

BBC complaining to Apple AI about making shit up instead of leaving it to their highly experienced reporters (ahem!) to make shit up.
#PotKettleBlack.

@vegetablegremlin I searched google for "UnionPay in Cuba". The idiotic AI summary waffled about Australia, etc, and said info was scarce re Cuba. The first result was a press release from UnionPay saying after all ATMs had previously adopted it, all Point of Sale machines do too. It is a system outside US sanctions etc.