> we're going to bomb them back into the Stone Age.
| 博客 | https://github.com/leosongwei/blog/blob/master/README.md |
| GPG | 4A77 DC2A 0157 DD7A 1326 FA2C 2420 F800 D431 23FF |
| 博客 | https://github.com/leosongwei/blog/blob/master/README.md |
| GPG | 4A77 DC2A 0157 DD7A 1326 FA2C 2420 F800 D431 23FF |
做了一个好玩的东西:GRDPO https://github.com/leosongwei/GRDPO/tree/master (省流见图片)
非常省资源,而且来得很快,一片4090(我猜16GB显存恐怕也行)这样强化学习训练不到1天,可以把Qwen2.5-1.5B拉升到接近于Qwen2.5-3B的水平。