Mastodawn

Simon Willison Jan 29, 2023

One of the things I'm finding so interesting about large language models like GPT-3 and ChatGPT is that they're pretty much the world's most impressive party trick

All they do is predict the next word based on previous context. It turns out when you scale the model above a certain size it can give the false impression of "intelligence", but that's a total fraud

It's all smoke and mirrors! The intriguing challenge is finding useful tasks you can apply them to in spite of the many, many footguns

Show thread

Simon Willison Jan 29, 2023

And in case this post wasn't clear: I'm all-in on large language models: they confidently pass my personal test for if a piece of technology is worth learning:

"Does this let me build things that I could not have built without it?"

What I find interesting is that - on the surface - they look like they solve a lot more problems than they actually do, partly thanks to the confidence with which they present themselves

Figuring out what they're genuinely good for is a very interesting challenge

Show thread

Sandra Jan 29, 2023

@simon

A missing 🧩 here is that making large language models is a huge destructive climate impact and we should quit it until we've got the climate sitch under control.

Using them is the same as any other app, so end-users and API-users, don't feel guilt-tripped. Making them, on the other hand, wrecks the world 💔

Show thread

Simon Willison

@Sandra I've not found that argument very convincing yet

Sure, there's a HUGE energy cost in training a model... but that model can then be put to use for many years into the future

text-davinci-003 was trained once, at great expense... but has since run inference millions (probably billions) of times for millions of people

This looks even better for openly released models like Stable Diffusion: trained once, then distributed to anyone who wants to use it

Show thread

Simon Willison Jan 29, 2023

@Sandra I remember being amused when I saw one model that was trained by a university in France that boasted that 90% of the power used to train that model came from a nearby nuclear reactor!

Wish I could find that reference now

Show thread

Sandra Jan 29, 2023

@simon

Seems to me like new models are popping up all the time these days 💔

Show thread

masukomi Jan 29, 2023

@Sandra @simon

I'm not sure "years into the future" even matters.

My thinking is this: it doesn't matter if my shovel lasts "years into the future" or not. What matters is "how much use do i get out of the shovel before it stops being a useful tool?" That same amount of use may be spread out over years or months.

I think the question is mostly just about amortization.

Well, that + moral effects / consequences of its use.

Show thread

Sandra Jan 29, 2023

@masukomi

That is an inapplicable solution to my complaint since externalities (in this case, that the planet will be wrecked) isn't fully factored into the cost of making the shovel, which means I want to optimize for as few new such shovels as possible regardless of how much they're used.

The fact that language model use is going up, making new models more "cost effective" in this leaky abstraction, is more a part of the problem than part of the solution.

The value the atmosphere needs us to optimize down isn't "fossils burnt divided by utility", it's "total fossils burnt". So increasing the amount of use cases harms more than it helps.

@simon

Show thread

David Bruchmann Jan 30, 2023

@Sandra @masukomi @simon I fully agree to your arguments amd think that the only justification for a model on this base would be that it's bringing down the energy to use. I'm not a big fan of smart devices but could imagine smartmeters to regulate heating, so in this direction there would be more benefit than invested energy.

Show thread

Wim🧮Jan 29, 2023

@simon @Sandra
Many of these models will need to be retrained continuously because the users will expect them to be up-to-date with news events, celebs etc. So it is not train-once. And also, this totally ignores the cost of using these things at scale, which dwarfs the cost of training them.

Show thread

Simon Willison Jan 29, 2023

@wim_v12e @Sandra I've been exploring alternatives to re-training the entire model to bake in new facts through mixing in results from other systems directly into the prompt - it's a really promising avenue: https://simonwillison.net/2023/Jan/13/semantic-search-answers/

The cost of using them is definitely enormous - I've seen reports of ChatGPT costing $3,000,000/day or more - but again, that's spread across many users

At least it's not Bitcoin mining!

How to implement Q&A against your documentation with GPT3, embeddings and Datasette

If you’ve spent any time with GPT-3 or ChatGPT, you’ve likely thought about how useful it would be if you could point them at a specific, current collection of text …

Show thread

Simon Willison Jan 29, 2023

@wim_v12e @Sandra I remain hopeful that some day in the future it will become possible to run a large language model on a personal device - for both energy and privacy reasons

I can run Stable Diffusion on my iPhone already, but that's a MUCH smaller model that the various LLMs

Show thread

Sandra Jan 29, 2023

@simon

Talking about making the models, not running them

Show thread

Simon Willison Jan 29, 2023

@Sandra I was replying to @wim_v12e who said "And also, this totally ignores the cost of using these things at scale, which dwarfs the cost of training them."

Show thread

Wim🧮Jan 29, 2023

@simon @Sandra
Even if they can be scaled down then that will lead to more instances of them, and very likely disproportionately more. I fear this technology might lead to quite a dramatic increase in emissions from computing.