the precise timeline of how OpenAI fucked over the RAM market

> October 2025: Sam Altman flies to Seoul and signs simultaneous deals with Samsung and SK Hynix for 900,000 DRAM wafers per month. That's 40% of global supply. Neither company knew the other was signing a near-identical commitment at the same time.

https://xcancel.com/aakashgupta/status/2038813799856374135

edit: this guy is a seriously bot-pilled pumper, but this seems to be a good summary of known facts. doubt the AI memory use trick he mentions is load bearing tho.

Aakash Gupta (@aakashgupta)

The timeline on this is genuinely insane. October 2025: Sam Altman flies to Seoul and signs simultaneous deals with Samsung and SK Hynix for 900,000 DRAM wafers per month. That's 40% of global supply. Neither company knew the other was signing a near-identical commitment at the same time. Those deals were letters of intent. Non-binding. No RAM actually changed hands. But the market treated them as gospel. Contract DRAM prices jumped 171%. A 64GB DDR5 kit went from $190 to $700 in three months. December 2025: Micron kills Crucial, its 29-year-old consumer memory brand, to reallocate every wafer to AI and enterprise customers. The company explicitly said it was exiting consumer memory to "improve supply and support for our larger, strategic customers in faster-growing segments." Translation: the AI demand signal was so loud that selling RAM to PC builders stopped making financial sense. March 2026: Google publishes TurboQuant, a compression algorithm that reduces AI memory requirements by 6x with zero accuracy loss. Cloudflare's CEO called it "Google's DeepSeek." The entire thesis that AI would consume infinite memory forever just got a six-month expiration date on it. Same month: OpenAI and Oracle cancel the Abilene Stargate expansion. The $500 billion data center vision that justified the RAM deals couldn't survive its own financing terms. Bloomberg attributed the collapse partly to OpenAI's "often-changing demand forecasting." MU is now down ~33% from its post-earnings high. Revenue up 196% year over year, EPS up 682%, and the stock is in freefall because the company restructured its entire business around a demand signal that came from non-binding letters and is now being compressed out of existence by a research paper. Micron bet the consumer division on Sam Altman's signature. The signature was worth exactly what the paper said: nothing binding.

Nitter
@davidgerard thanks sam!

@ariadne @davidgerard “Google publishes TurboQuant, a compression algorithm that reduces AI memory requirements by 6x with zero accuracy loss.”

This algorithm is somehow only applicable to AI??

@BillSaysThis @davidgerard yes, it is possible to create domain-specific compression algorithms that are better than general ones.
@ariadne @BillSaysThis @davidgerard Really? I’ve been using pngcrush for audio files.
@Vorsos @ariadne @BillSaysThis @davidgerard Reminds me of when I took a bunch of manga PNG, converted then to BMP and compressed all back using 7z and the resulting file was smaller than compressing the original PNGs using 7z

@Vorsos

I can't tell if you're serious, but Ariadne is right. Simple example: Flac will losslessly compress audio better than zip or gzip will. That's why it was invented. 😄

@ariadne @BillSaysThis @davidgerard

@CppGuy @Vorsos @ariadne @BillSaysThis @davidgerard

Interestingly enough, Chinchilla 70B was trained mostly on text and beat domain-specific compressors PNG and FLAC in one experiment.

https://arxiv.org/abs/2309.10668

Not saying you are wrong. I assume that newer, domain-specific algorithms would still outperform the general Chinchilla algorithm, and there can be practical downsides if they involve large memory requirements, even if they result in more efficient compression.

Language Modeling Is Compression

It has long been established that predictive models can be transformed into lossless compressors and vice versa. Incidentally, in recent years, the machine learning community has focused on training increasingly large and powerful self-supervised (language) models. Since these large language models exhibit impressive predictive capabilities, they are well-positioned to be strong compressors. In this work, we advocate for viewing the prediction problem through the lens of compression and evaluate the compression capabilities of large (foundation) models. We show that large language models are powerful general-purpose predictors and that the compression viewpoint provides novel insights into scaling laws, tokenization, and in-context learning. For example, Chinchilla 70B, while trained primarily on text, compresses ImageNet patches to 43.4% and LibriSpeech samples to 16.4% of their raw size, beating domain-specific compressors like PNG (58.5%) or FLAC (30.3%), respectively. Finally, we show that the prediction-compression equivalence allows us to use any compressor (like gzip) to build a conditional generative model.

arXiv.org
@CppGuy @ariadne @BillSaysThis @davidgerard Good to know. I’m always serious about jokes.
@BillSaysThis @ariadne @davidgerard if so, it's because they were doing something stupid and this fixes that IMO.
@demofox @BillSaysThis @ariadne yeah I'd be slightly interested in the details, but also only slightly because (a) if it were applicable anywhere else we'd all know about it (b) we're far enough up and along the S curve i can see 6x the memory giving only a slight improvement. Maybe plain ML can benefit a lot, I dunno.
@davidgerard @demofox @BillSaysThis @ariadne my question is just whether this will make RAM less expensive. I’m guessing “no”, because that would be a good thing, and it seems increasingly likely that we can’t have those.
@[email protected] @[email protected] @[email protected] @[email protected] A couple points, bearing in mind that this is the first time I'm encountering TurboQuant and might be misspeaking:

  • This is perhaps neither here nor there, but the X account making the originally-quoted post is https://www.aibyaakash.com , "AI by Aakash" (this is linked later in the same thread). The person seems fully AI-pilled and has several AI-themed substacks
  • TurboQuant, or at least the QJL bit, sounds suspiciously like Locality-Sensitive Hashing. That's a well-known technique, and it can definitely do impressive things. When I tried my hand at startups I made heavy use of it (see https://bucci.onl/notes/Legit-tech ). In my use case I could get something like a 1,000-fold compression with acceptable accuracy loss. Basically LSH can be used to turn a long vector of floats into a comparatively short bitstring without losing too much of the geometrical information in the float vectors. Even one bit packs a ton of information
  • The general problem of vector search that this method aims to address is an old one, and rotating or compressing the vectors is nothing new. In old school linear algebra things like diagonalization or SVD do this, for instance. I don't know if that's what they're doing but it's a general class of technique and a straightforward thing to try
  • Vector quantization is, of course, also quite old. You experience it every time you listen to an MP3.
So, it's possible this is a characteristic Google move of taking existing science, ramming it through their engineering machine, and suggesting novelty with a clever title, headline, and/or new name. Which is not to suggest it's a bad piece of engineering. I couldn't say. However, it's possible this is a Google rebrand, and the questions raised in this thread, like "wouldn't we already know about this? wouldn't it be applied outside of AI?" are answered by: yes, we did already know about this and yes, it has already been applied outside of AI. Oh and yes, it'd be quite silly if nobody thought to try these old school techniques in the latest incarnation of LLM-based AI before 2026.
AI by Aakash | Substack

Everything you need to know about AI and nothing more. Click to read AI by Aakash, by Aakash Gupta, a Substack publication with hundreds of thousands of subscribers.

@davidgerard @demofox @BillSaysThis @ariadne UFD Tech discussed it the other day and it only applies to a very specific aspect of AI resulting in a tiny overall shrink in memory consumption that's being used to load slightly larger models. And it started being used middle of last year, meaning it's already baked in.