One of the problems with those large generative language models is that they need to be kept up-to-date. People will expect the chat bot to know who this week's PM is, or the current hit song. Which means that training will become an ongoing process. So the carbon footprint will become many times larger than it already is. It will also be very expensive, but I don't think OpenAI has much of a choice as customers naturally want the bot to be up-to-date.

Furthermore, the use of the model also has a considerable footprint. An estimate is given here:
https://medium.com/@chrispointon/the-carbon-footprint-of-chatgpt-e1bc14e4cc2a

3.82 tCOâ‚‚e per day assuming a million users with 10 queries per day. It means that after about half a year, the user footprint exceeds the training footprint.
Of course 1M users with 10 queries is a very small number, and likely to grow steeply.

#ClimateEmergency

The carbon footprint of ChatGPT - Chris Pointon - Medium

UPDATE, March 3 2023: The spiralling use of ChatGPT means it is most likely hosted in a range of locations with different electricity carbon intensities. This makes it impossible to give a reasonable…

Medium

Just to be clear, in the big picture (total global ICT emissions of 5 GtCOâ‚‚e) this kind of footprint is still very small. But with a billion users (in practice, this will be other software, not people) with 100 queries per day (a very low estimate I think), we're at 15MtCOâ‚‚e, and if there would be 10 such models for different purposes, we can see that it starts to add up quickly.

#ClimateEmergency
#LowCarbonComputing

If that 15MtCOâ‚‚e sounds improbably, consider this: currently, Bing and Google each process 10 billion queries per day.

If each of these queries would require a call to an LLM of the same complexity as GPT-3, as seems to be their intention, then that is already 3MtCOâ‚‚e purely from Bing and Google searches alone, without any growth or any other applications.

Just to keep this with the thread, purely training GPT-3 generates 550 ton of CO2. So compared to the emissions resulting from the queries, that is quite small; but as argued above, it will have to be done almost continuously to keep them up-to-date.

@wim_v12e @csepp

not to forget that developing these things means training them many
times. train, rejigg the network, train, fix a bug, train, many
times. this is not usually reported. most likely it is a couple of
orders of magnitude more than 0.5kT to produce a static trained model.

mostly they don't seem to be continuously feeding these things. not
yet anyways.

@chainik

Quite so, that was the start of my thread ^_^

@csepp