https://en.wikipedia.org/wiki/Cordouan_Lighthouse #digitalscavengerhunt #useragent #robotpolicy #404adventure #techhumor #webnavigation #HackerNews #ngated
Ilir Aliu (@IlirAliu_)
๋ก๋ด ์ ์ฑ ์ ์ ์ฒด ์ฌํ์ตํ์ง ์๊ณ ๋ด๋ถ ์ํ๋ฅผ ์์ feature vector๋ก ์์ถํ ๋ค, ๊ทธ ์์ ์์ RL ๋ ์ด์ด๋ง ํ์ตํ๋ 'RL token' ์์ด๋์ด๋ฅผ ์๊ฐํ๋ค. ๋ก๋ด ์ ์ฑ fine-tuning ์๊ฐ์ ๋ฉฐ์น ์์ ๋ช ๋ถ์ผ๋ก ์ค์ผ ์ ์๋ค๊ณ ๊ฐ์กฐํ๋ฉฐ, ๋ก๋ณดํฑ์ค ํ์ต ํจ์จ์ ํฌ๊ฒ ๋์ผ ์ ์๋ ์ ๊ทผ์ด๋ค.
https://x.com/IlirAliu_/status/2036366477075366246
#robotics #reinforcementlearning #finetuning #robotpolicy #ai

Robots building robots. RL token is a simple but powerful idea: Fine-tuning robot policies usually takes days. This takes minutes. Instead of retraining the full model, compress its internal state into a small feature vector and train a tiny RL layer on top. โข small actor +