凉拌茶叶

@leo_song
237 Followers
106 Following
9K Posts
critical thinking your critical thinking
博客https://github.com/leosongwei/blog/blob/master/README.md
GPG4A77 DC2A 0157 DD7A 1326 FA2C 2420 F800 D431 23FF
> we're going to bomb them back into the Stone Age.
我想与文艺作品中强调坚持相反,马拉松实际上是一个相当强调克制的运动,必须压制你的斗志,压制你争强好胜的心理,压制你快速奔跑的爽感,才能完赛🤔
感觉qwen code这样不记录历史数据反而很有意思。每次我开始干活,我就开始让模型根据代码修订文档,review它修订的文档的同时我自己也可以重温之前的设计。🤔
香香快乐鞋
> On March 3, 2023, a torrent containing LLaMA's weights was uploaded, with a link to the torrent shared on the 4chan imageboard and subsequently spread through online AI communities
🤪 有两个没发X
草,能跑出残影来,比旁边人工队还快
~🎶鋼琴呆坐玻璃窗後看雨絲🕺💧~🎶微涼是這寂寥午夜時💃🌙~🎶我彈琴望雨🎹~🎶不經意地再奏出一首老調子👯‍♂ ~

做了一个好玩的东西:GRDPO https://github.com/leosongwei/GRDPO/tree/master (省流见图片)

非常省资源,而且来得很快,一片4090(我猜16GB显存恐怕也行)这样强化学习训练不到1天,可以把Qwen2.5-1.5B拉升到接近于Qwen2.5-3B的水平。

GitHub - leosongwei/GRDPO: Large Reasoning Model时代的强化学习体验卡?

Large Reasoning Model时代的强化学习体验卡?. Contribute to leosongwei/GRDPO development by creating an account on GitHub.

GitHub