Mastodawn

Hublai (charismatic megafauna)5d ago

#subquadraticsparseattention optimizes #quadratic solving not the way #1.58bit does - by reducing data to #ternary to do no math just logic - but by first exploring the heaviest connections so the rest can be mostly ignored. These approaches are complimentary. #Attention youtube.com/shorts/IupOu...

SubQ: The Attention Matrix Dis...

SubQ: The Attention Matrix Disappears...

YouTube