Mastodawn

scrlk Mar 3

MacBook Pro with M5 Pro and M5 Max

https://www.apple.com/newsroom/2026/03/apple-introduces-macbook-pro-with-all-new-m5-pro-and-m5-max/

Apple introduces MacBook Pro with all‑new M5 Pro and M5 Max, delivering breakthrough pro performance and next-level on-device AI

Apple announced the latest 14- and 16-inch MacBook Pro with the all-new M5 Pro and M5 Max.

Apple Newsroom

Show thread

jbellis Mar 3

I chased down what the "4x faster at AI tasks" was measuring:

> Testing conducted by Apple in January 2026 using preproduction 13-inch and 15-inch MacBook Air systems with Apple M5, 10-core CPU, 10-core GPU, 32GB of unified memory, and 4TB SSD, and production 13-inch and 15-inch MacBook Air systems with Apple M4, 10-core CPU, 10-core GPU, 32GB of unified memory, and 2TB SSD. Time to first token measured with an 8K-token prompt using a 14-billion parameter model with 4-bit quantization, and LM Studio 0.4.1 (Build 1). Performance tests are conducted using specific computer systems and reflect the approximate performance of MacBook Air.

Show thread

fulafel Mar 3

So it's not measuring output tokens/s, just how long it takes to start generating tokens. Seems we'll have to wait for independent benchmarks to get useful numbers.

Show thread

oofbey

Token/s is entirely determined by memory bandwidth. TTFT is compute bound.

Show thread

fulafel Mar 4

This is broadly correct for currently favoured software, but in computer science optimization problems you can usually trade off compute for memory and vice versa.

For example just now from the front page: https://news.ycombinator.com/item?id=47242637 "Speculative Speculative Decoding"

Or this: https://openreview.net/forum?id=960Ny6IjEr "Low-Rank Compression of Language Models Via Differentiable Rank Selection"

Speculative Speculative Decoding (SSD) | Hacker News

Show thread

saagarjha Mar 4

None of these really change the fundamental shape of the problem.