After Apple shared initial local inference benchmarks (M4 vs M5, moderate RAM) on their Machine Learning blog, there are now real-world reports of M5 Macs with 64–128GB running much larger models.
Hopefully, local inference is moving from “barely works” to “actually usable”.
https://machinelearning.apple.com/research/exploring-llms-mlx-m5
https://www.reddit.com/r/LocalLLaMA/comments/1s0czc4/round_2_followup_m5_max_128g_performance_tests_i/
