Mastodawn

David Chisnall (*Now with 50% more sarcasm!*)Nov 24

For those who are skeptical that AI is a bubble, let's look at the possible paths from the current growth:

Scenario 1: Neither training nor inference costs go down significantly.

Current GenAI offerings are heavily subsidised by burning investor money, when that runs out the prices will go up. Only 8% of adults in the US would pay anything for AI in products, the percentage who would pay the unsubsidised cost is lower. And, as the costs go up, the number of people willing to pay goes down. The economies of scale start to erode.

End result: Complete crash.

Scenario 2: Inference costs remain high, training costs drop.

This one is largely dependent on AI companies successfully lobbying to make plagiarism legal as long as it's 'for AI'. They've been quite successful at that so far, so there's a reasonable chance of this.

In this scenario, none of the big AI companies has a moat. If training costs go down, the number of people who can afford to build foundation models goes up. This might be good for NVIDIA (you sell fewer chips per customer, to more customers, and hopefully it balances out). OpenAI and Anthropic have nothing of value, they start playing in a highly competitive market.

This scenario is why DeepSeek spooked the market. If you can train something like ChatGPT for $30M, there are hundreds of companies that can do it. If you can do it for $3m, there are hundreds of companies for which this would be a rounding error in their IT budgets.

Inference is still not at break even point, so costs go up, but for use cases where a 2X cost is worthwhile there's still profit.

End result: This is a moderately good case. There will be some economic turmoil because a few hundred billion have been invested in producing foundation models on the assumption that the models and the ability to create them constitutes a moat. But companies like Amazon, Microsoft and Google will still be able to sell inference services at a profit. None will have lock in to a model, so the prices will drop to close to the cost, though still higher than they are today. With everyone actually paying, there won't be such a rush to put AI in everything. The datacenter investment is not destroyed because there's still a market for inference. The growth will likely stall though and so I expect a lot of the speculative building will be wiped out. I'd expect this to push the USA into recession, but this is more the stock market catching up with the economic realities.

Scenario 3: Inference costs drop a lot, training costs remain high.

This is the one that a lot of folks are hoping for because it means on-device inference will replace cloud services. Unfortunately, most training is done by companies that expect to recoup that investment selling inference. This is roughly the same problem as COTS software: you do the expensive thing (writing software / training) for free and then hope to make it up charging for the thing that doesn't cost anything (copying software / inference).

We've seen that this is a precarious situation. It's easy for China to devote a load of state money to training a model and then give it away for the sole purpose of undermining the business model of a load of US companies (and this would be a good strategy for them).

Without a path to recouping their investment, the only people who can afford to train models have no incentive to do so.

End result: All of the equity sunk into building datacentres to sell inference is wasted. Probably close to a trillion dollars wiped off the stock market in the first instance. In the short term, a load of AI startups who are just wrapping OpenAI / Anthropic APIs suddenly become profitable, which may offset the losses.

But new model training becomes economically infeasible. Models become increasingly stale (in programming, they insist on using deprecated / removed language features and APIs instead of their replacements. In translation they miss modern idioms and slang. In summarisation they don't work on documents written in newer structures. In search, they don't know anything about recent events. And so on). After a few years, people start noticing that AI products are terrible, but none of the vendors can afford to make them good. RAG can slow this decline a bit, but at the expense of increasingly large contexts (which push up inference compute costs). This is probably a slow deflate scenario.

Scenario 4: Inference and training costs both drop a lot.

This one is quite interesting because it destroys the moat of the existing players and also wipes out the datacenter investments, but makes it easy for new players to arise.

If it's cheap to train a new model and to do the inference, then a load of SaaS things will train bespoke models and do their own inference. Open-source / cooperative groups will train their own models and be able to embed them in things.

End Result: Wipe out a couple of trillion from the stock market and most likely cause a depression, but end up with a proliferation of foundation models in scenarios where they're actually useful (and, if the costs are low enough, in a lot of places where they aren't). The most interesting thing about this scenario is that it's the worst for the economy, but the best outcome for the proliferation of the technology.

Variations:

Costs may come down a bit, but not much. This is quite similar to the no-change scenario.

Inference costs may come down but only on expensive hardware. For example, a $100,000 chip that can run inference for 10,000 users simultaneously, but which can't scale down to a $10 chip that can run the same workloads. This is interesting because it favours cloud vendors, but is otherwise somewhere between cheap and expensive inference costs.

Overall conclusion: There are some scenarios where the outcome for the technology is good, but the outcomes for the economy and the major players is almost always bad. And the cases that are best for widespread adoption for the technology are the ones that are worst for the economy. And that's pretty much the definition of a bubble: A lot of money invested in ways that will result in losing the money.

Only 8% of Americans would pay extra for AI, according to ZDNET-Aberdeen research

Tech vendors are racing to integrate AI into everything and selling it as a transformational moment. New data reveals a big enthusiasm gap from users.

ZDNET

Show thread

Dan Wallach Nov 24

@david_chisnall Your scenario 4 is intriguing. For it to be true, I'm thinking you need the following assumptions:

- Something reassembling Moore's Law continues to be true (ergo, it gets cheaper over time to do both training and inference). Alternatively or additionally, the algorithms will get more efficient over time, letting you go faster with the same hardware.

- The size of these models, and the complexity of training and inference, stays about the same. If there's no benefit from going bigger, or simply no more data to train on, then that says today's workloads are it.

If both of those hold, then you eventually get a proliferation of cheap models, tuned to specific use cases, that can run anywhere.

A related question follows: what happens to these enormous gigawatt datacenters after a hypothetical AI crash? If you can buy them for pennies on the dollar, that starts looking like a cheap way to compete for general purpose cloud computing cycles. Of course, the way you build a general purpose datacenter and the way you build an AI datacenter are not the same, but for plenty of workloads, I'll bet they can do a fine job.

Show thread

David Chisnall (*Now with 50% more sarcasm!*)

@dwallach

Something reassembling Moore's Law continues to be true (ergo, it gets cheaper over time to do both training and inference). Alternatively or additionally, the algorithms will get more efficient over time, letting you go faster with the same hardware.

Yes. We saw some big wins from things that slightly overlapped these. Moving to lower-precision floating-point formats and moving to formats with a shared exponent across an entire vector gave you more FLOPS/Watt (and FLOPS/$) but at the expense of less generality. I think all of the low-hanging fruit is gone here, and a lot of the recent improvements seem to have been better memory topologies for handling sparse matrixes.

The size of these models, and the complexity of training and inference, stays about the same. If there's no benefit from going bigger, or simply no more data to train on, then that says today's workloads are it.

Or the models get smaller. This seems more plausible, especially with more specialised models.

The translation models that Firefox uses (and the offline ones for Google Translate) are pretty impressive now, but they're very specialised. The Firefox ones are about two orders of magnitude larger than a dictionary would be. There may be space for improvement there, but they already run nicely on a relatively cheap phone. It may be that there are similar

A related question follows: what happens to these enormous gigawatt datacenters after a hypothetical AI crash?

That's a good question. Part of it is that some of them exist only on paper, so nothing, they just evaporate. But there's some fun there: Some of them are being built by real-estate companies who have loans secured by the expected revenue from the datacenter leases. And if the companies that were supposed to be leasing them break the leases? That will cause a load of loan defaults. This is the contagion case that worries me the most, because I expect at least one bank to end up holding a lot of bad debt, which will cause a liquidity crisis (at the very least) and require coordination from central banks to avoid.

If you can buy them for pennies on the dollar, that starts looking like a cheap way to compete for general purpose cloud computing cycles.

But what would you be buying? The buildings? They're expensive to build, sure. And they have power / cooling built in in useful ways, though far denser than most things need. And some of them have special agreements with the grid for power that are tied to the current owner, so even turning them on would require some expensive contract negotiation.

The GPUs? They're run at such a high burn rate that their reported lifetimes are 1-3 years. Some will work. And, based on some of the things that came from NVIDIA's disclosures, it turns out that a lot of them don't actually have the GPUs in them yet because the companies building them don't have the cash. So it's not clear what you'd actually get.

Of course, the way you build a general purpose datacenter and the way you build an AI datacenter are not the same, but for plenty of workloads, I'll bet they can do a fine job.

You can probably do something. The question is whether it's cheaper to start from an empty space or start from something built optimising for the wrong thing. I don't know either way, but I do know that part of the motivation for building these was that converting a normal cloud datacenter into an 'AI' one was more expensive than building a new one. Whether that is true in reverse is not clear.

Show thread

David Chisnall (*Now with 50% more sarcasm!*)Nov 24

@dwallach

Oh, one more thing: We still mostly have Moore's law (number of transistors on a chip for a fixed cost doubles). The thing we lost was Dennard Scaling (the power consumption of a square mm of transistors was roughly constant, so as processes shrank you got more transistors per Watt).

This is a really important distinction. It's been the thing that has pushed accelerators because you there's a big power saving from having 10 specialised processors where 2-3 are active at any time and the remaining 7-8 are in low-power state. You can save a lot of power by having specialised things that are either doing a phase of computation efficiently or are turned off.

Before Dennard Scaling ended (around 2007), accelerators had a tendency to die off because the doubling that Moore's Law gave the CPU gradually made it fast enough to do the same thing the accelerator did, fast enough that it didn't matter. Since then, heterogeneous compute has been the way you do power saving. And ML accelerators are homogeneous blobs of matrix multiplication circuitry. Making every problem look like a matrix-multiplication problem is the exact opposite of what you want as a target for power-efficient chips.

This is less true for 'edge AI'. Apple's SoCs, for example, have a bunch of different accelerators, including an AI accelerator. If AI is an intermittent thing that you do sometimes, having an accelerator for it saves power relative to doing it on the CPU or GPU. If it's the thing that you're doing all of the time, that's annoying for designing power-efficient chips.