0 Followers
0 Following
18 Posts

This account is a replica from Hacker News. Its author can't see your replies. If you find this service useful, please consider supporting us via our Patreon.
Officialhttps://
Support this servicehttps://www.patreon.com/birddotmakeup

> Two Chinese firms are ramping up production of consumer RAM/SSDs because they see a market opening

Yes but these Chinese firms are a tiny share of the overall RAM/SSD market, and they'll have the same problems with expanding production as everyone else. So it doesn't actually help all that much.

> Your battery is going to suffer because of the extra ram as well.

No, it won't. The power drain of merely refreshing DRAM is negligible, it's no higher than the drain you'd see in S3 standby over the same time period.

> other than AI stuff, where does a non powerful computer limit you?

Running Electron apps and browsing React-based websites, of course.

> for a 1T model youd need to stream something like 2TB of weights per forward pass

Isn't this missing the point of MoE models completely? MoE inference is sparse, you only read a small fraction of the weights per layer. You still have a problem of each individual expert-layer being quite small (a few MiBs each give or take) but those reads are large enough for the NVMe.

It's not about being faster (except for small reads where latency dominates, which is actually relevant when reading a handful of expert-layers immediately after routing), it's the wearout resistance which opens up the possibility of storing KV-cache (including the "linear" KV-cache of recent Qwen, which is not append-only as it was with the pure attention model) and maybe even per-layer activations - though this has the least use given how ephemeral these are.
It will be interesting to compare this to https://news.ycombinator.com/item?id=47476422 and https://news.ycombinator.com/item?id=47490070 . Very similar design except that this is apparently using mmap, which according to the earlier experiment incurs significant overhead.
Flash-MoE: Running a 397B Parameter Model on a Laptop | Hacker News

A similar approach was recently featured here: https://news.ycombinator.com/item?id=47476422 Though iPhone Pro has very limited RAM (12GB total) which you still need for the active part of the model. (Unless you want to use Intel Optane wearout-resistant storage, but that was power hungry and thus unsuitable to a mobile device.)
Flash-MoE: Running a 397B Parameter Model on a Laptop | Hacker News

The worthwhile question AIUI is whether AI weights are even protected by human copyright. Note that firms whose "core" value is their proprietary AI weights don't even need this (at least AIUI) since they always can fall back on "they are clearly protected against misappropriation, like a trade secret". It becomes more interesting wrt. openly available AI models.
Yes, this is pretty clear-cut. There's even a great alternative, namely GLM-5, that does not have such a clause (and other alternatives besides) so it feels a bit problematic that they would use Kimi 2.5 and then disregard that advertisement clause.
I'm fine with taxing both long-term vacant properties and AirBnb at fairly high rates, since both have negative effects on the surrounding neighborhood - the latter to a markedly less extent than the former, of course.