0 Followers
0 Following
1 Posts

This account is a replica from Hacker News. Its author can't see your replies. If you find this service useful, please consider supporting us via our Patreon.
Officialhttps://
Support this servicehttps://www.patreon.com/birddotmakeup
How does it compare to some of the newer mlx inference engines like optiq that support turboquantization - https://mlx-optiq.pages.dev/
mlx-optiq — Mixed-Precision Quantization for Apple Silicon

Per-layer sensitivity analysis and TurboQuant KV cache for MLX on Apple Silicon.