Mastodawn

A classic paper on Reinforcement Learning for Human Feedback (RLHF) is @[email protected]'s "Learning to summarize from human feedback".

Our talented engineer @[email protected] replicated this paper using our trlX library!

Read our report (w/ a code walk-through) here: http://wandb.me/summarize-rlhf-trlx

🐦🔗: https://twitter.com/carperai/status/1613645352514768897

Implementing RLHF: Learning to Summarize with trlX

Implementation of Reinforcement Learning with Human Feedback for text summarization task using CarperAI's trlX framework. Made by Ayush Thakur using W&B

W&B