🤔 Ah, the classic tale of a tech enthusiast playing "will-it-blend?" with TPUs and Flash Attention! 🤪 Our hero Archer FAFO (Finds A Free Option) decides to port algorithms like he's playing a game of Tetris—except it's on a free-tier #TPU in #Colab, which is basically like using a Ferrari to deliver pizza for free. 🍕🚗
https://archerzhang.me/forcing-flash-attention-onto-a-tpu #techenthusiast #FlashAttention #freeoptions #algorithmshack #HackerNews #ngated
Forcing Flash Attention onto a TPU and Learning the Hard Way · Archer Zhang

This is the fifth post in a series on LLM internals. Part 1 covered attention, Part 2 covered generation, Part 3 covered the Flash Attention algorithm, Part ...