Hypura – A storage-tier-aware LLM inference scheduler for Apple Silicon

https://github.com/t8/hypura

GitHub - t8/hypura: Run models too big for your Mac's memory

Run models too big for your Mac's memory. Contribute to t8/hypura development by creating an account on GitHub.

GitHub
Intel Optane rolling in its grave.

Wouldn't be Intel if they didn't quit halfway through on a good thing.

Still, couldn't one get a RAID 0 card with four drives to saturate a 16x lane? That's already the max one could push through PCIe anyhow.