https://en.wikipedia.org/wiki/Cordouan_Lighthouse #digitalscavengerhunt #useragent #robotpolicy #404adventure #techhumor #webnavigation #HackerNews #ngated
Ilir Aliu (@IlirAliu_)
λ‘λ΄ μ μ± μ μ 체 μ¬νμ΅νμ§ μκ³ λ΄λΆ μνλ₯Ό μμ feature vectorλ‘ μμΆν λ€, κ·Έ μμ μμ RL λ μ΄μ΄λ§ νμ΅νλ 'RL token' μμ΄λμ΄λ₯Ό μκ°νλ€. λ‘λ΄ μ μ± fine-tuning μκ°μ λ©°μΉ μμ λͺ λΆμΌλ‘ μ€μΌ μ μλ€κ³ κ°μ‘°νλ©°, λ‘보ν±μ€ νμ΅ ν¨μ¨μ ν¬κ² λμΌ μ μλ μ κ·Όμ΄λ€.
https://x.com/IlirAliu_/status/2036366477075366246
#robotics #reinforcementlearning #finetuning #robotpolicy #ai

Robots building robots. RL token is a simple but powerful idea: Fine-tuning robot policies usually takes days. This takes minutes. Instead of retraining the full model, compress its internal state into a small feature vector and train a tiny RL layer on top. β’ small actor +