Writing an LLM from scratch, part 13 -- the 'why' of attention, or: attention heads are dumb

A pause to take stock: realising that attention heads are simpler than I thought explained why we do the calculations we do.

Giles' Blog
Writing an LLM from scratch, part 10 -- dropout

Adding dropout to the LLM's training is pretty simple, though it does raise one interesting question

Giles' Blog