Mastodawn

Zandikar Jul 24, 2023

This is a fascinating read on how the attention function (kind of like it's attention span while learning) of an AI works, and the off-by-one bug that may be present in the maths/implementaton that underpins it: https://www.evanmiller.org/attention-is-off-by-one.html

Attention Is Off By One

Let’s fix these pesky Transformer outliers using Softmax One and QuietAttention.

Show thread

Adam Barnett Jul 25, 2023

@zandikar This paper is beyond me, but is this the thing at the core of Bayesian statistics?

Show thread

Zandikar

@dreadpir8robots Precisely. Bayesian Inferencing and theorem is very heavily used in the AI field - It's largely that, Set/Number/Category Theory and Probability Gradients all the way down on the maths theory side of this. The softmax function in the Attention Formula is an implementation of Bayesian inferencing, and it in turn is what processes probabilistic outcomes (vectors) with respect to it's inputs, so it can weigh what to focus on, so to speak.