Hypura – A storage-tier-aware LLM inference scheduler for Apple Silicon

https://github.com/t8/hypura

GitHub - t8/hypura: Run models too big for your Mac's memory

Run models too big for your Mac's memory. Contribute to t8/hypura development by creating an account on GitHub.

GitHub
Where does "1T parameter model" come from? I can only see models with 70B params or less mentioned in the repo.
Yeah title comes from nowhere in the link. No doubt it's possible but all that matters is speed and we learn nothing of that here...