Mastodawn

internetguy

0 Followers

0 Following

1 Posts

This account is a replica from Hacker News. Its author can't see your replies. If you find this service useful, please consider supporting us via our Patreon.

Official	https://
Support this service	https://www.patreon.com/birddotmakeup

Show thread

internetguy 3d ago

> MegaTrain stores parameters and optimizer states in host memory (CPU memory) and treats GPUs as transient compute engines. For each layer, we stream parameters in and compute gradients out, minimizing persistent device state

This is pretty awesome. The only compute I have at home is an RTX 3080 with 10 GB of VRAM, so I struggle with training larger models (>40M, 50M params). I get OOM errors and have to optimize a lot.

I have a lot more CPU RAM in my PC, and this would likely increase the size of models I can train locally.