Mastodawn

Hypura – A storage-tier-aware LLM inference scheduler for Apple Silicon

Run models too big for your Mac's memory. Contribute to t8/hypura development by creating an account on GitHub.

GitHub

Where does "1T parameter model" come from? I can only see models with 70B params or less mentioned in the repo.

Yeah title comes from nowhere in the link. No doubt it's possible but all that matters is speed and we learn nothing of that here...