🚀👏 Behold, the ultimate manifesto on "speculative decoding" – because why wouldn't you want to spend 19 minutes decoding nonsense about tokens, attention, and roofline maths? 🤯 It's like that one friend who insists on explaining their #crypto investments, but with even more #jargon and fewer results. 📉🔍
https://fergusfinn.com/blog/economics-of-speculative-decoding/ #speculativedecoding #attentiontokens #rooflinemaths #techhumor #HackerNews #ngated
The economics of speculative decoding

Two underexplored axes: what MoE routing does to the decode roofline, and how compressed attention takes away the slack that used to make speculated tokens free.