Someone discovered that slapping the word "cutlass" on a #kernel magically boosts #performance by 100 tflops! ⚡🔪 Meanwhile, #GitHub is busy throwing #AI #buzzwords around like confetti, because who needs actual substance when you have Sparkly New Features™? 🙄🎉
https://github.com/triton-lang/triton/pull/7298 #cutlass #boost #tech #news #optimization #HackerNews #ngated
[Gluon][Tutorial] Persistent attention by Mogball · Pull Request #7298 · triton-lang/triton

Rewrite the attention kernel to be persistent. This gives better performance at low-contexts. However, fp16 at large context has suffered a bit due to a ptxas instruction scheduling issue in the so...

GitHub