Writing an LLM from scratch, part 13 – attention heads are dumb
https://www.gilesthomas.com/2025/05/llm-from-scratch-13-taking-stock-part-1-attention-heads-are-dumb
#HackerNews #WritingLLM #AttentionHeads #AIResearch #MachineLearning #TechBlog
Writing an LLM from scratch, part 13 – attention heads are dumb
https://www.gilesthomas.com/2025/05/llm-from-scratch-13-taking-stock-part-1-attention-heads-are-dumb
#HackerNews #WritingLLM #AttentionHeads #AIResearch #MachineLearning #TechBlog
Writing an LLM from scratch, part 10 – dropout
https://www.gilesthomas.com/2025/03/llm-from-scratch-10-dropout
#HackerNews #WritingLLM #Dropout #MachineLearning #AIDevelopment #TechBlog