Mastodawn

Given what it's gonna do to the world economy when the bubble will burst I think it's better if we drop all references to intelligence and just start calling it subprime computing

Show thread

Ntropic 10h ago

@gabrielesvelto Marketing being scammy is nothing new.
The US will probably not be in a position to help the companies very much, as the US can't borrow as foreign governments have little interest in US bonds. Since AI companies aren't relevant for the wider economy (t's not housing or energy or food), this will mostly just correct peoples retirement funds after their explosion over the last 4 years. A market correction.

Show thread

Ntropic 10h ago

@gabrielesvelto The comanies are currently operating at a loss, but that's not necessary. Due to the exponential scaling laws, distillation and quantization, we can run 100x smaller models at 100x lower price with almost the same performance. As investor money runs out, companies will simply have to run models at a size that covers the costs. Google will likely win that race, since they can continue operating at a loss for longer due to their remaining revenue streams and in house tensor chips

Show thread

Gabriele Svelto

@Ntropic assuming that happens it still doesn't solve the issue with training. Training is a constant unavoidable price, and the presence of LLM outputs in the inputs means that filtering is becoming more and more expensive, in turning making training more and more expensive. The alternative is accepting at least a measure of model collapse. It also assumes a favorable legal environment and - at least in the EU - that's not gonna last for long.

Show thread

Ntropic 9h ago

@gabrielesvelto I can keep using existing models, I don't see why one would "need" to train new models over and over.

The training for scaling to larger models will not be maintainable at the current rate, but it is also not necessary.

For model knowledge to not be outdated, new information can be introduced with many much more efficient approaches, from old school LoRA to SotA adding Engrams. We mainly train to implement new tricks, such as more efficient architectures and add new concepts

Show thread

Ntropic 9h ago

@gabrielesvelto in the mid to long term RL post training will likely keep increasing, as this has shown the bigget benefits and its scalablility has been much improved. Model decay would come from pre-training data, and that is already available and only slightly increased between model versions - probably because they want to avoid model decay.