Mastodawn

Simon Willison Jan 29, 2023

One of the things I'm finding so interesting about large language models like GPT-3 and ChatGPT is that they're pretty much the world's most impressive party trick

All they do is predict the next word based on previous context. It turns out when you scale the model above a certain size it can give the false impression of "intelligence", but that's a total fraud

It's all smoke and mirrors! The intriguing challenge is finding useful tasks you can apply them to in spite of the many, many footguns

Show thread

Simon Willison Jan 29, 2023

And in case this post wasn't clear: I'm all-in on large language models: they confidently pass my personal test for if a piece of technology is worth learning:

"Does this let me build things that I could not have built without it?"

What I find interesting is that - on the surface - they look like they solve a lot more problems than they actually do, partly thanks to the confidence with which they present themselves

Figuring out what they're genuinely good for is a very interesting challenge

Show thread

Sandra Jan 29, 2023

@simon

A missing 🧩 here is that making large language models is a huge destructive climate impact and we should quit it until we've got the climate sitch under control.

Using them is the same as any other app, so end-users and API-users, don't feel guilt-tripped. Making them, on the other hand, wrecks the world 💔

Show thread

Simon Willison Jan 29, 2023

@Sandra I've not found that argument very convincing yet

Sure, there's a HUGE energy cost in training a model... but that model can then be put to use for many years into the future

text-davinci-003 was trained once, at great expense... but has since run inference millions (probably billions) of times for millions of people

This looks even better for openly released models like Stable Diffusion: trained once, then distributed to anyone who wants to use it

Show thread

Simon Willison

@Sandra I remember being amused when I saw one model that was trained by a university in France that boasted that 90% of the power used to train that model came from a nearby nuclear reactor!

Wish I could find that reference now