Mastodawn

nova Apr 14, 2023

Daily Productive Sharing 695 - Reinforcement Learning from Human Feedback (RLHF)

Reinforcement Learning from Human Feedback (RLHF) 是 ChatGPT 大获成功的一项关键技术，HuggingFace 非常详细地介绍了整个 RLHF 的流程，其中有不少非常巧妙的做法：
1 RLHF 是一个非常复杂的训练过程，需要多个模型训练和大量的工程实践；
2 针对 LLM 的奖励模型，需要基于文本给出一个打分，但是直接让人去打分会非常主观。比如同一条数据，第一个和第二个标注者会给出完全不同的分数，这样会对后续的训练造成很大影响。目前比较好的实践，是让两个模型基于同样的输入产生输出，然后标注者只要在两个里面进行比较即可。最后将这些数据汇总起来得到一个总分。
3 RLHF 的质量取决于两个因素，一是最初人工标注的文本质量，二是人工打分的质量。

https://letters.acacess.com/daily-productive-sharing-695/

Daily Productive Sharing 695 - Reinforcement Learning from Human Feedback (RLHF)

One helpful tip per day:) Reinforcement Learning from Human Feedback (RLHF) is a key technology that has been a key point of the success of ChatGPT. HuggingFace provides a detailed description of the entire RLHF process, which includes serveral genius thoughts: 1. RLHF is a very complex training process that

DPS - Daily Productivity Sharing

Daily Productive Sharing 695 - Reinforcement Learning from Human Feedback (RLHF)

Daily Productive Sharing 696 - Motivations

Daily Productive Sharing 697 - Old Books

Daily Productive Sharing 698 -- Inner Voice

Daily Productive Sharing 699 - $22K in 7 days

Daily Productive Sharing 700 - How to Love Mindfully?

Daily Productive Sharing 701 - If You Feel Burned Out at Work

Daily Productive Sharing 702 - 3 Procrastination Tips

Daily Productive Sharing 703 - Lessons from 35 Years of Publishing

Daily Productive Sharing 704 - An $800M Side Project

Daily Productive Sharing 705 - 5 Years of Indie Hacking

Daily Productive Sharing 706 - Psychology of Happiness

Daily Productive Sharing 707 - Reading Writing by Humans

Daily Productive Sharing 708 - Consistency

Daily Productive Sharing 709 - Discord Improved Their Marriage

Daily Productive Sharing 710 - What Makes You Happy

Daily Productive Sharing 711 - Discipline is Destiny

Daily Productive Sharing 712 - Reading Changes your Brain

Daily Productive Sharing 713 - How to Finish?

Daily Productive Sharing 714 - Mistakes as A New Manager

Daily Productive Sharing 715 - Shepherding ChatGPT

Daily Productive Sharing 716 - Circular Growth

Daily Productive Sharing 717 - Is Critical Thinking Still Important?

Daily Productive Sharing 718 - Four Psychological Theories

Daily Productive Sharing 719 - Make Something Real

Daily Productive Sharing 720 - What Morgan Housel Thinks

Daily Productive Sharing 721 - How to Trick Your Mind Into Falling Asleep

Daily Productive Sharing 722 - Book Recommendation by Phil Eaton

Daily Productive Sharing 723 - You Are Your Enemy

Daily Productive Sharing 724 - ATS Myths Busted

Daily Productive Sharing 725 - Chatbots Are Not the Future

Daily Productive Sharing 726 - Live Backward

Daily Productive Sharing 727 - Reading List by Christina Cacioppo

Daily Productive Sharing 728 - When you do not need money or attention

Daily Productive Sharing 729 - Profile of Patrick Collison

Daily Productive Sharing 730 - Two Types of Confidence

Daily Productive Sharing 731 - Do Not Let Your Past Decide Your Future

Daily Productive Sharing 732 - Books Help You to Ship

Daily Productive Sharing 733 - How to Publish A Book?

Daily Productive Sharing 734 - 3 Lessons About Money

Daily Productive Sharing 735 - Money You Shouldn't Take

Daily Productive Sharing 736 - Apple Vision

Daily Productive Sharing 737 - 3 Books About MUJI

Daily Productive Sharing 738 - Fewer Opinions

Daily Productive Sharing 739 - Self-Employment vs. "Normal" Jobs

Daily Productive Sharing 740 - First Impressions of Vision Pro

Daily Productive Sharing 741 - No Regret

Daily Productive Sharing 742 - Kindle Powered by AI

Daily Productive Sharing 743 - A Writing Method for Embodied Self-reflection

Daily Productive Sharing 744 - The Art of the Desk Setup

Daily Productive Sharing 745 - Trying to "shape" AI to Protect Workers Is A Bad Idea

Daily Productive Sharing 746 - Get It Done

Daily Productive Sharing 747 - Make Something Wonderful

Daily Productive Sharing 748 - Why would you write a book?

Daily Productive Sharing 749 - Jump In

Daily Productive Sharing 750 - Compounding Optimism

Daily Productive Sharing 751 - 36 Lessons at 36-year-old

Daily Productive Sharing 752 - Why is the world so fucked up?

Daily Productive Sharing 753 - Why Do You Believe?

Daily Productive Sharing 754 - Why I retired from Apple?

Daily Productive Sharing 755 - Living in Mini-Lives