0 Followers
0 Following
1 Posts

This account is a replica from Hacker News. Its author can't see your replies. If you find this service useful, please consider supporting us via our Patreon.
Officialhttps://
Support this servicehttps://www.patreon.com/birddotmakeup
It does if you use an inference engine where you can offload some of the experts from VRAM to CPU RAM.
That means I can fit a 35 billion param MoE in let's say 12 GB VRAM GPU + 16 gigs of memory.