RLHF from Scratch: A Complete Alignment Study | Brayan’s Blog
https://brayanbrayan.github.io/2026/04/02/rlhf-post-blog.html
RLHF from Scratch: A Complete Alignment Study

SFT · PPO · GRPO · DPO implementation, evaluation, and hyperparameter sensitivity

Brayan’s Blog