Mastodawn

Hacker News May 11, 2025

Writing an LLM from scratch, part 13 – attention heads are dumb

https://www.gilesthomas.com/2025/05/llm-from-scratch-13-taking-stock-part-1-attention-heads-are-dumb

#HackerNews #WritingLLM #AttentionHeads #AIResearch #MachineLearning #TechBlog

Writing an LLM from scratch, part 13 -- the 'why' of attention, or: attention heads are dumb

A pause to take stock: realising that attention heads are simpler than I thought explained why we do the calculations we do.

Giles' Blog

Hacker News Mar 20, 2025

Writing an LLM from scratch, part 10 – dropout

https://www.gilesthomas.com/2025/03/llm-from-scratch-10-dropout

#HackerNews #WritingLLM #Dropout #MachineLearning #AIDevelopment #TechBlog

Writing an LLM from scratch, part 10 -- dropout

Adding dropout to the LLM's training is pretty simple, though it does raise one interesting question

Giles' Blog