One of the things I'm finding so interesting about large language models like GPT-3 and ChatGPT is that they're pretty much the world's most impressive party trick

All they do is predict the next word based on previous context. It turns out when you scale the model above a certain size it can give the false impression of "intelligence", but that's a total fraud

It's all smoke and mirrors! The intriguing challenge is finding useful tasks you can apply them to in spite of the many, many footguns

And in case this post wasn't clear: I'm all-in on large language models: they confidently pass my personal test for if a piece of technology is worth learning:

"Does this let me build things that I could not have built without it?"

What I find interesting is that - on the surface - they look like they solve a lot more problems than they actually do, partly thanks to the confidence with which they present themselves

Figuring out what they're genuinely good for is a very interesting challenge

@simon

A missing 🧩 here is that making large language models is a huge destructive climate impact and we should quit it until we've got the climate sitch under control.

Using them is the same as any other app, so end-users and API-users, don't feel guilt-tripped. Making them, on the other hand, wrecks the world 💔

@Sandra I've not found that argument very convincing yet

Sure, there's a HUGE energy cost in training a model... but that model can then be put to use for many years into the future

text-davinci-003 was trained once, at great expense... but has since run inference millions (probably billions) of times for millions of people

This looks even better for openly released models like Stable Diffusion: trained once, then distributed to anyone who wants to use it

@Sandra I remember being amused when I saw one model that was trained by a university in France that boasted that 90% of the power used to train that model came from a nearby nuclear reactor!

Wish I could find that reference now

@simon

Seems to me like new models are popping up all the time these days 💔

@Sandra @simon

I'm not sure "years into the future" even matters.

My thinking is this: it doesn't matter if my shovel lasts "years into the future" or not. What matters is "how much use do i get out of the shovel before it stops being a useful tool?" That same amount of use may be spread out over years or months.

I think the question is mostly just about amortization.

Well, that + moral effects / consequences of its use.

@masukomi

That is an inapplicable solution to my complaint since externalities (in this case, that the planet will be wrecked) isn't fully factored into the cost of making the shovel, which means I want to optimize for as few new such shovels as possible regardless of how much they're used.

The fact that language model use is going up, making new models more "cost effective" in this leaky abstraction, is more a part of the problem than part of the solution.

The value the atmosphere needs us to optimize down isn't "fossils burnt divided by utility", it's "total fossils burnt". So increasing the amount of use cases harms more than it helps.

@simon
@Sandra @masukomi @simon I fully agree to your arguments amd think that the only justification for a model on this base would be that it's bringing down the energy to use. I'm not a big fan of smart devices but could imagine smartmeters to regulate heating, so in this direction there would be more benefit than invested energy.
@simon @Sandra
Many of these models will need to be retrained continuously because the users will expect them to be up-to-date with news events, celebs etc. So it is not train-once. And also, this totally ignores the cost of using these things at scale, which dwarfs the cost of training them.

@wim_v12e @Sandra I've been exploring alternatives to re-training the entire model to bake in new facts through mixing in results from other systems directly into the prompt - it's a really promising avenue: https://simonwillison.net/2023/Jan/13/semantic-search-answers/

The cost of using them is definitely enormous - I've seen reports of ChatGPT costing $3,000,000/day or more - but again, that's spread across many users

At least it's not Bitcoin mining!

How to implement Q&A against your documentation with GPT3, embeddings and Datasette

If you’ve spent any time with GPT-3 or ChatGPT, you’ve likely thought about how useful it would be if you could point them at a specific, current collection of text …

@wim_v12e @Sandra I remain hopeful that some day in the future it will become possible to run a large language model on a personal device - for both energy and privacy reasons

I can run Stable Diffusion on my iPhone already, but that's a MUCH smaller model that the various LLMs

@simon

Talking about making the models, not running them
@Sandra I was replying to @wim_v12e who said "And also, this totally ignores the cost of using these things at scale, which dwarfs the cost of training them."
@simon @Sandra
Even if they can be scaled down then that will lead to more instances of them, and very likely disproportionately more. I fear this technology might lead to quite a dramatic increase in emissions from computing.