One of the things I'm finding so interesting about large language models like GPT-3 and ChatGPT is that they're pretty much the world's most impressive party trick

All they do is predict the next word based on previous context. It turns out when you scale the model above a certain size it can give the false impression of "intelligence", but that's a total fraud

It's all smoke and mirrors! The intriguing challenge is finding useful tasks you can apply them to in spite of the many, many footguns

And in case this post wasn't clear: I'm all-in on large language models: they confidently pass my personal test for if a piece of technology is worth learning:

"Does this let me build things that I could not have built without it?"

What I find interesting is that - on the surface - they look like they solve a lot more problems than they actually do, partly thanks to the confidence with which they present themselves

Figuring out what they're genuinely good for is a very interesting challenge

@simon

A missing 🧩 here is that making large language models is a huge destructive climate impact and we should quit it until we've got the climate sitch under control.

Using them is the same as any other app, so end-users and API-users, don't feel guilt-tripped. Making them, on the other hand, wrecks the world 💔

@Sandra I've not found that argument very convincing yet

Sure, there's a HUGE energy cost in training a model... but that model can then be put to use for many years into the future

text-davinci-003 was trained once, at great expense... but has since run inference millions (probably billions) of times for millions of people

This looks even better for openly released models like Stable Diffusion: trained once, then distributed to anyone who wants to use it

@simon @Sandra
Many of these models will need to be retrained continuously because the users will expect them to be up-to-date with news events, celebs etc. So it is not train-once. And also, this totally ignores the cost of using these things at scale, which dwarfs the cost of training them.

@wim_v12e @Sandra I've been exploring alternatives to re-training the entire model to bake in new facts through mixing in results from other systems directly into the prompt - it's a really promising avenue: https://simonwillison.net/2023/Jan/13/semantic-search-answers/

The cost of using them is definitely enormous - I've seen reports of ChatGPT costing $3,000,000/day or more - but again, that's spread across many users

At least it's not Bitcoin mining!

How to implement Q&A against your documentation with GPT3, embeddings and Datasette

If you’ve spent any time with GPT-3 or ChatGPT, you’ve likely thought about how useful it would be if you could point them at a specific, current collection of text …

@wim_v12e @Sandra I remain hopeful that some day in the future it will become possible to run a large language model on a personal device - for both energy and privacy reasons

I can run Stable Diffusion on my iPhone already, but that's a MUCH smaller model that the various LLMs

@simon

Talking about making the models, not running them
@Sandra I was replying to @wim_v12e who said "And also, this totally ignores the cost of using these things at scale, which dwarfs the cost of training them."
@simon @Sandra
Even if they can be scaled down then that will lead to more instances of them, and very likely disproportionately more. I fear this technology might lead to quite a dramatic increase in emissions from computing.