Mastodawn

> Two Chinese firms are ramping up production of consumer RAM/SSDs because they see a market opening

Yes but these Chinese firms are a tiny share of the overall RAM/SSD market, and they'll have the same problems with expanding production as everyone else. So it doesn't actually help all that much.

Show thread

zozbot234 1d ago

> Your battery is going to suffer because of the extra ram as well.

No, it won't. The power drain of merely refreshing DRAM is negligible, it's no higher than the drain you'd see in S3 standby over the same time period.

Show thread

zozbot234 1d ago

> other than AI stuff, where does a non powerful computer limit you?

Running Electron apps and browsing React-based websites, of course.

Show thread

zozbot234 4d ago

> for a 1T model youd need to stream something like 2TB of weights per forward pass

Isn't this missing the point of MoE models completely? MoE inference is sparse, you only read a small fraction of the weights per layer. You still have a problem of each individual expert-layer being quite small (a few MiBs each give or take) but those reads are large enough for the NVMe.

Show thread

zozbot234 4d ago

It's not about being faster (except for small reads where latency dominates, which is actually relevant when reading a handful of expert-layers immediately after routing), it's the wearout resistance which opens up the possibility of storing KV-cache (including the "linear" KV-cache of recent Qwen, which is not append-only as it was with the pure attention model) and maybe even per-layer activations - though this has the least use given how ephemeral these are.

Show thread

zozbot234 4d ago

It will be interesting to compare this to https://news.ycombinator.com/item?id=47476422 and https://news.ycombinator.com/item?id=47490070 . Very similar design except that this is apparently using mmap, which according to the earlier experiment incurs significant overhead.

Flash-MoE: Running a 397B Parameter Model on a Laptop | Hacker News

Show thread

zozbot234 5d ago

A similar approach was recently featured here: https://news.ycombinator.com/item?id=47476422 Though iPhone Pro has very limited RAM (12GB total) which you still need for the active part of the model. (Unless you want to use Intel Optane wearout-resistant storage, but that was power hungry and thus unsuitable to a mobile device.)

Flash-MoE: Running a 397B Parameter Model on a Laptop | Hacker News

Show thread

zozbot234 Mar 20

The worthwhile question AIUI is whether AI weights are even protected by human copyright. Note that firms whose "core" value is their proprietary AI weights don't even need this (at least AIUI) since they always can fall back on "they are clearly protected against misappropriation, like a trade secret". It becomes more interesting wrt. openly available AI models.

Show thread

zozbot234 Mar 20

Yes, this is pretty clear-cut. There's even a great alternative, namely GLM-5, that does not have such a clause (and other alternatives besides) so it feels a bit problematic that they would use Kimi 2.5 and then disregard that advertisement clause.

Show thread

zozbot234 Mar 19

I'm fine with taxing both long-term vacant properties and AirBnb at fairly high rates, since both have negative effects on the surrounding neighborhood - the latter to a markedly less extent than the former, of course.

Official	https://
Support this service	https://www.patreon.com/birddotmakeup