After Apple shared initial local inference benchmarks (M4 vs M5, moderate RAM) on their Machine Learning blog, there are now real-world reports of M5 Macs with 64–128GB running much larger models.

Hopefully, local inference is moving from “barely works” to “actually usable”.

https://machinelearning.apple.com/research/exploring-llms-mlx-m5
https://www.reddit.com/r/LocalLLaMA/comments/1s0czc4/round_2_followup_m5_max_128g_performance_tests_i/

Exploring LLMs with MLX and the Neural Accelerators in the M5 GPU

Mac with Apple silicon is increasingly popular among AI developers and researchers interested in using their Mac to experiment with the…

Apple Machine Learning Research
@weichsel Extrapolating this to a M5 Ultra makes it sound that those Mac Studios will sell like warm Kaisersemmeln.
@kommen I’m really curious whether local inference will ever become truly viable, or if hosted models will always retain an advantage. A maxed-out M5 Ultra might turn out to be a pretty expensive Kaisersemmel when competing with 20-100€/month hosted models.