Mastodawn

Simon Willison Jan 29, 2023

One of the things I'm finding so interesting about large language models like GPT-3 and ChatGPT is that they're pretty much the world's most impressive party trick

All they do is predict the next word based on previous context. It turns out when you scale the model above a certain size it can give the false impression of "intelligence", but that's a total fraud

It's all smoke and mirrors! The intriguing challenge is finding useful tasks you can apply them to in spite of the many, many footguns

Show thread

Simon Willison Jan 29, 2023

And in case this post wasn't clear: I'm all-in on large language models: they confidently pass my personal test for if a piece of technology is worth learning:

"Does this let me build things that I could not have built without it?"

What I find interesting is that - on the surface - they look like they solve a lot more problems than they actually do, partly thanks to the confidence with which they present themselves

Figuring out what they're genuinely good for is a very interesting challenge

Show thread

Nicholas Weaver

@simon

The way I look at it. Machine learning in general (including these large language models) are great when you have the following problem criteria

#1: You need to build a pattern matcher
#2: You don't know what to look for.
#3: When the pattern matcher is finally built you don't care to know what it actually looks for
#4: The results are allowed to be hilariously, insanely wrong some % of the time

And there are actually a lot of things that match that criteria

Show thread

Pamela Fox Jan 29, 2023

@ncweaver @simon im totally on board with ML if we acknowledge #4- also why I'm uncomfortabe with ML being used for 1:1 therapy situations at this point.

Show thread

Simon Willison Jan 29, 2023

@pamelafox @ncweaver the 1-1 therapy thing is terrifying to me - imagine trying to get therapy from your iPhone keyboard!

But that said, I do use ChatGPT as an alternative to rubber duck programming sometimes: if I'm stuck on something I'll kick off a conversation purely as a thinking aid, and it's often effective

That feels pretty different to me from the therapy thing, but not a million miles away from it

Show thread

Pamela Fox Jan 29, 2023

@simon @ncweaver yeah to be fair, I'm totally gonna try chatgpt for therapy-light, like social situation advice, but I'm emotionally stable enough and am aware that chatgpt3 is just an LLM. my concern is for folks who may be close to harming themselves or others, and an overly humanized chatbot tells them something that guides them awry.

Show thread

Nicholas Weaver Jan 29, 2023

@pamelafox @simon
And I'm a security person, so most of my applications can't stand #4...

Show thread

Simon Willison Jan 29, 2023

@ncweaver I really like that criteria list

It would be really useful if there was a solid, easy to understand list of use-cases and anti-use-cases to point people to

Show thread

Nicholas Weaver Jan 29, 2023

@simon I think for anti-use case, #4 itself captures it. "Is it OK if you are hilariously, outrageously, gobsmackingly wrong and you don't know it?"

Which is why I find Tesla's "AI first" development model for autonomy frightening, beyond just the fact that they are training it based on how Tesla drivers drive...

Show thread

Jesse Jan 29, 2023

@ncweaver @simon I think 3 is a usual but not always. Sometimes for instance generative AI will bring out patterns that you can observe but weren't obvious before. "Oh, I see the model associates X with Y."

Show thread

Noam Jan 29, 2023

@ncweaver @simon I think there's a lot of gaming and entertainment applications specifically for experiences that are hard to script. Imagine an offline game where you can bargain or reason with NPCs in natural language, and sometimes they say something dead stupid, but that's part of the charm.

Show thread

Hans Gerwitz Jan 29, 2023

@ncweaver @simon is the list of “when delegating to another human is called for” different?

Show thread

Nicholas Weaver Jan 29, 2023

@hans @simon
Humans will tell you what they are matching on, and the hilariously wrong failure modes are often different.

Show thread

Ari Koponen Jan 29, 2023

@ncweaver @simon Essentially, the best use case is: Writing comedy.