Mastodawn

fly51fly (@fly51fly)

Google Research가 CoDistill-GRPO를 소개했다. 이 방법은 효율적인 Group Relative Policy Optimization을 위해 공동 distillation 레시피를 제안하며, 강화학습 기반 모델 학습의 비용과 효율성을 개선할 수 있는 연구다.

https://x.com/fly51fly/status/2054308804741992625

#googleresearch #distillation #grpo #reinforcementlearning

fly51fly (@fly51fly) on X

[LG] CoDistill-GRPO: A Co-Distillation Recipe for Efficient Group Relative Policy Optimization S M Kwon, Z Sun, A T Suresh, H Jain, S Kumar [Google Research] (2026) https://t.co/qlRq6oMJDZ

X (formerly Twitter)

Arint - SEO+KI May 8

RT @DJLougen: Veröffentlicht ein neues Modell/die neue Methode für GRPO: Qwen3.5-9B-NSC-ACE-SABER.

mehr auf Arint.info

#AgenticAI #AIResearch #GRPO #HuggingFace #MachineLearning #Qwen3 #arint_info

https://x.com/DJLougen/status/2052433218687685020#m

Arint - SEO+KI (@[email protected])

RT @DJLougen: Veröffentlicht ein neues Modell/die neue Methode für GRPO: Qwen3.5-9B-NSC-ACE-SABER. <a href="https://arint.info/@Arint/116536960021539987">mehr</a> auf <a href="https://arint.info/">Arint.info</a> #AgenticAI #AIResearch #GRPO #HuggingFace #MachineLearning #Qwen3 #arint_info <a href="https://x.com/DJLougen/status/2052433218687685020#m">https://x.com/DJLougen/status/2052433218687685020#m</a>

Mastodon Glitch Edition

sayzard May 5

Leonie (@helloiamleonie)

GRPO를 학습한 뒤, liquidai의 LFM2.5-1.2B-Instruct를 UnslothAI와 무료 Kaggle T4 GPU로 미세조정하는 방법을 소개하는 블로그와 노트북이 공유됐다. 강화학습 기반 파인튜닝 기법과 경량 언어모델 실습에 유용한 개발자용 자료다.

https://x.com/helloiamleonie/status/2051396124649398551

#grpo #finetuning #unsloth #kaggle #llm

Leonie (@helloiamleonie) on X

Spent the weekend crossing one thing off my "to learn" list: GRPO In this blog, we walk through: • What is GRPO and how does it work • Fine-tune @liquidai's LFM2.5-1.2B-Instruct • using @UnslothAI and some free @kaggle T4s Blog: https://t.co/LTNNU4thK5 Kaggle Notebook:

X (formerly Twitter)

Arint - SEO+KI Apr 30

RT @HowToAI_: Tencent hat Feinabstimmung und Reinforcement Learning mit einem Budget von 18 US-Dollar abgeschafft.

mehr auf Arint.info

#DeepSeek #GRPO #Innovation #KünstlicheIntelligenz #MachineLearning #Tencent #arint_info

https://x.com/HowToAI_/status/2049567036003795269#m

Arint - SEO+KI (@[email protected])

RT @HowToAI_: Tencent hat Feinabstimmung und Reinforcement Learning mit einem Budget von 18 US-Dollar abgeschafft. <a href="https://arint.info/@Arint/116494495410227948">mehr</a> auf <a href="https://arint.info/">Arint.info</a> #DeepSeek #GRPO #Innovation #KünstlicheIntelligenz #MachineLearning #Tencent #arint_info <a href="https://x.com/HowToAI_/status/2049567036003795269#m">https://x.com/HowToAI_/status/2049567036003795269#m</a>

Mastodon Glitch Edition

sayzard Apr 11

Omar Khattab (@lateinteraction)

@a1zhang의 새 블로그가 언어 모델의 미래를 다루며, RLM-Qwen3-4B에 대해 32k 토큰의 쉬운 장문맥 과제로 GRPO를 학습해도 1M 토큰, 8-needle 장문맥 작업으로 자동 일반화되고 100% 신뢰도로 동작한다는 결과가 핵심으로 소개됐다.

https://x.com/lateinteraction/status/2042668150185947627

#llm #grpo #longcontext #rl #qwen3

Omar Khattab (@lateinteraction) on X

New must-read blog by @a1zhang on the future of language models. Buried nugget: doing GRPO for RLM-Qwen3-4B on short (32k token) and easy (single-needle) MRCRv2 long-context tasks generalizes *automatically* and with perfect (100%) reliability to 1M-token, 8-needle tasks!!

X (formerly Twitter)

National Monuments Alerts Apr 4

Grand Portage National Monument #grpo #nationalmonument
⛔ Park Closure ⛔
Issued: 4/4/2026 12:00 AM EDT

Early Closure - Grand Portage National Monument Heritage Center

Due to extreme weather, Grand Portage National Monument will close the Heritage Center on Saturday, April 4th at noon. The Heritage Center will resume normal operating hours on Monday, April 6th from 9 am to 4:30 pm.

http://www.nps.gov/grpo

Grand Portage National Monument (U.S. National Park Service)

Travel into the past to discover the present. Explore the partnership between the Grand Portage Anishinaabe and the North West Company during the North American fur trade. Experience the sights and smells of a bustling depot reconstructed in its historic location. See how it shaped co-management with the NPS today. Follow pathways to the past to imagine a drum echo over Gichigami - Lake Superior.

AI Daily Post Mar 11

Google’s latest research shows AI agents can learn to cooperate even when facing unpredictable opponents, using a new GRPO algorithm that blends decentralized training with classic RL. The findings could reshape multi‑agent systems and open‑source AI collaborations. Dive in! #AIAgents #ReinforcementLearning #MultiAgentLearning #GRPO

🔗 https://aidailypost.com/news/google-shows-ai-agents-cooperate-unpredictable-opponents-using

Habr Feb 21

От RLHF к DPO и дальше: как мы разучились бояться и полюбили выравнивание LLM

В 2022 году существовал ровно один способ сделать языковую модель «хорошей» — RLHF. Один. Если вы хотели, чтобы ваша LLM отвечала адекватно и хотя бы делала вид, что понимает вопрос, — вам нужны были армия аннотаторов и бюджет уровня OpenAI. Четыре года спустя у нас зоопарк из десятка методов выравнивания, половину из которых можно запустить на одной RTX 4090 за выходные. DPO убрал reward model. SimPO убрал reference model. GRPO и DeepSeek R1 доказали, что RL жив — но в новой форме. Anthropic опубликовала конституцию Claude на ~80 страниц в открытом доступе и сменила парадигму: от правил к причинам. Мир изменился. Разбираемся, как именно. В статье — полная история пост-обучения от RLHF до Constitutional AI, математика ключевых методов (в спойлерах, без боли), рабочий код на TRL + QLoRA с гиперпараметрами, большие сравнительные таблицы и дерево решений «что выбрать для вашей задачи». Плюс честный разговор о проблемах, о которых не пишут в туториалах: distribution mismatch, reward hacking, catastrophic forgetting и почему модели умеют «притворяться» выровненными. Для разработчиков, ML-инженеров и всех, кто хоть раз открывал Hugging Face и думал: «а что если я это fine-tune...»

https://habr.com/ru/articles/1002298/

#LLM #RLHF #DPO #finetuning #выравнивание #LoRA #QLoRA #GRPO #Constitutional_AI #языковые_модели

От RLHF к DPO и дальше: как мы разучились бояться и полюбили выравнивание LLM

В 2022 году существовал ровно один способ сделать языковую модель «хорошей» — RLHF. Один. Если вы хотели, чтобы ваша LLM отвечала адекватно, не генерировала токсичность и хотя бы делала вид, что...

Хабр

National Monuments Alerts Feb 19

Grand Portage National Monument #grpo #nationalmonument
ℹ️ Information ℹ️
Issued: 2/19/2026 12:00 AM EST

Delayed Opening - Grand Portage National Monument Heritage Center

Due to the extreme weather, Grand Portage National Monument will delay opening the Heritage Center on Thursday, February 19 until 10:00 a.m. The Heritage Center will remain open until 4:30 p.m. and resume normal operating hours on Friday, February 20 from 9:00 a.m. to 4:30 p.m.

http://www.nps.gov/grpo

Grand Portage National Monument (U.S. National Park Service)

National Monuments Alerts Feb 18

Grand Portage National Monument #grpo #nationalmonument
⛔ Park Closure ⛔
Issued: 2/18/2026 12:00 AM EST

Weather Alert - Monument is closed Wednesday, February 18

Due to extreme weather, Grand Portage National Monument is closed Wednesday, February 18, 2026.

https://www.nps.gov/grpo/index.htm