Hi :)
auch wenn ich ein einsamer rufer zu dem thema bin, werde ich das immer wenn ich es sehe hier wieder bringen,

) wann hΓΆrt es endlich auf, das die knappe TV nachrichten sendezeit im ΓΆrr mit fussball verschwendet wird?
die leute die es interessiert haben sich lΓ€ngst alle interessierenden infos anderweitig geholt.
das ist doch ein relikt aus der anfangszeit, in der es noch keine kurzfristigen informationsquelllen gab.
das ist heute mehr als flΓΌssig :)

vielleicht finden sich ja gleichgesinnte dann biitte
nicht favorisieren sondern boosten, damit es eine wirkung bekommt. danke :)

#nachrichten #TV #orr #fussball #tagesschau #tagesthemen #heute #heutjorunal #ard #zdf #rl #sat1 #pro7

πŸ˜ƒ πŸ’š Hallo, moin, tach. πŸ’š Danke allen neuen FollowerInnen (m/w/d), es ehrt und freut mich sehr. πŸ’š Bin zurzeit nicht so prΓ€sent, bei mir im #RL brennt nicht nur die Luft. Schaue mir alle an und folge gerne zurΓΌck, wenn es passt. πŸ˜‰ 😘
πŸ˜ƒ πŸ’š Hello, everyone. πŸ’š Thank you to all my new followers (m/f/d), I am very honored and delighted. πŸ’š I'm not very active at the moment, as things are pretty hectic in my #RL. I'll take a look at everyone and will be happy to follow back if it suits me. πŸ˜‰ 😘
Fynn (@fynnso) on X

was messing with the OpenAI base URL in Cursor and caught this accounts/anysphere/models/kimi-k2p5-rl-0317-s515-fast so composer 2 is just Kimi K2.5 with RL at least rename the model ID

X (formerly Twitter)

Wes Roth (@WesRoth)

Minimax M2.7이 인간 κ°œμž… 없이 100회 이상 슀슀둜 μ§„ν™”ν•œ 방식이 맀우 μ΄λ‘€μ μœΌλ‘œ μ†Œκ°œλλ‹€. 초기 λ²„μ „μ˜ 같은 λͺ¨λΈλ‘œ 연ꡬ μ—μ΄μ „νŠΈλ₯Ό λ§Œλ“€μ–΄ RL νŒ€ μ—…λ¬΄μ˜ 30~50%λ₯Ό μ²˜λ¦¬ν•˜κ²Œ ν–ˆκ³ , 이후 μ˜ˆμƒ λ°–μ˜ μˆ˜μ€€μœΌλ‘œ λ°œμ „ν–ˆλ‹€λŠ” 점이 핡심이닀. 자율적 λͺ¨λΈ κ°œμ„ κ³Ό 연ꡬ μžλ™ν™” μ‚¬λ‘€λ‘œ μ£Όλͺ©λœλ‹€.

https://x.com/WesRoth/status/2034492750150992042

#minimax #researchagent #rl #automation #llm

Wes Roth (@WesRoth) on X

how Minimax M2.7 was made is absolutely INSANE it "evolved" 100+ times with zero human input They built a research agent using an early version of that same model soon it was handling 30 to 50 percent of their RL team's entire workflow. and then it got WEIRD

X (formerly Twitter)

Yam Peleg (@Yampeleg)

λͺ¨λΈμ΄ μžμ‹  λ‹€μŒ 버전을 λ§Œλ“œλŠ” 데 μ–Όλ§ˆλ‚˜ κΈ°μ—¬ν–ˆλŠ”μ§€λ₯Ό κΈ°μ€€μœΌλ‘œ ν‰κ°€ν–ˆλ‹€λŠ” λ‚΄μš©μ΄λ‹€. RL νŒ€μ˜ μž‘μ—… 일뢀λ₯Ό λͺ¨λΈμ΄ 개발 κ³Όμ •μ—μ„œ λŒ€μ‹  μˆ˜ν–‰ν•˜λ„λ‘ ν•œ μžλ™ 연ꡬ(auto-research) λ°©μ‹μœΌλ‘œ, 개발 μžλ™ν™”μ™€ μžκΈ°κ°œμ„ ν˜• AI μ—°κ΅¬μ˜ κ°€λŠ₯성을 보여쀀닀.

https://x.com/Yampeleg/status/2034353562881273889

#autoresearch #rl #selfimprovement #ai #research

Yam Peleg (@Yampeleg) on X

The model was evaluated by how much it contributed to building the next version of itself. This is a crazy post. They basically did auto-research IRL: Maximizing how much the RL team's work is delegated to the model during it's development. (Answer: 30-50% btw) Everything

X (formerly Twitter)

Luke The Dev (@iamlukethedev)

ν˜„μž¬ 'the gym'은 μ—μ΄μ „νŠΈμ˜ μŠ€ν‚¬ 개발 과제λ₯Ό λ‚˜νƒ€λ‚΄λ©° μ¦‰μ‹œ RL ν™˜κ²½μ€ μ•„λ‹ˆλΌκ³  λ°νž™λ‹ˆλ‹€. λ‹€λ§Œ 이λ₯Ό κ°•ν™”ν•™μŠ΅(RL) ν›ˆλ ¨ ν™˜κ²½μœΌλ‘œ μ „ν™˜ν•˜λŠ” 아이디어λ₯Ό κΈμ •μ μœΌλ‘œ 보고 있으며, ν–₯ν›„ ν›ˆλ ¨Β·ν‰κ°€ μš©λ„λ‘œμ˜ μ „ν™˜ κ°€λŠ₯성을 μ œμ‹œν•©λ‹ˆλ‹€.

https://x.com/iamlukethedev/status/2033283565832528269

#reinforcementlearning #gym #rl #agents

Luke The Dev (@iamlukethedev) on X

@akshay_pachaar Not RL for now. The gym represents agents working on skill development tasks. But I like the idea of turning it into an RL training environment.

X (formerly Twitter)

So far, I've been coding up my reinforcement learning assignments from scratch, which has been great.

For my next experiment, though, I want to use ANNs for function approximation, and code that's compatible with standard algorithms and environments commonly used in the field. So, I'm looking into RL libraries!

I started with torchrl, just because it's prominent and torch is the research standard. I've been trying to get it to work for a few days now, and... I hate it! Just a really convoluted system of abstract interfaces with shoddy documentation for how to use them.

I think I'll try skrl next. That seems simpler, more elegant, and much better documented (also, it uses torch under the hood).

That said, I worry both of these libraries are too prescriptive. They're streamlined for a traditional RL workflow, but I'll be building some weird hybrid algorithms, and I dunno if they'll fit. But we'll see! I can't even investigate that until I get PPO working in a custom environment.

#rl #programming

Avi Chawla (@_avichawla)

OpenClawκ°€ RL을 λ§Œλ‚¬λ‹€λŠ” λ°œν‘œμž…λ‹ˆλ‹€. κΈ°μ‘΄μ—” μ—μ΄μ „νŠΈκ°€ λ©”λͺ¨λ¦¬ 파일과 μŠ€ν‚¬λ‘œ μ μ‘ν–ˆμœΌλ‚˜ κΈ°λ³Έ λͺ¨λΈ κ°€μ€‘μΉ˜λŠ” λ³€ν•˜μ§€ μ•Šμ•˜κ³ , OpenClaw-RL은 이λ₯Ό ν•΄κ²°ν•œλ‹€κ³  μ„€λͺ…ν•©λ‹ˆλ‹€. 자체 ν˜ΈμŠ€νŒ… λͺ¨λΈμ„ OpenAI ν˜Έν™˜ API둜 λž˜ν•‘ν•˜μ—¬ OpenClaw의 μ‹€μ‹œκ°„ λŒ€ν™”λ₯Ό κ°€λ‘œμ±„ μ •μ±…(policy)을 μ‹€μ‹œκ°„μœΌλ‘œ ν•™μŠ΅μ‹œν‚€λŠ” μ ‘κ·Όμž…λ‹ˆλ‹€.

https://x.com/_avichawla/status/2031264340297527651

#openclaw #openclawrl #reinforcementlearning #rl #openai

Avi Chawla (@_avichawla) on X

OpenClaw meets RL! OpenClaw Agents adapt through memory files and skills, but the base model weights never actually change. OpenClaw-RL solves this! It wraps a self-hosted model as an OpenAI-compatible API, intercepts live conversations from OpenClaw, and trains the policy in

X (formerly Twitter)

Tencent HY (@TencentHunyuan)

WorldCompassλΌλŠ” RL ν¬μŠ€νŠΈνŠΈλ ˆμ΄λ‹ ν”„λ ˆμž„μ›Œν¬λ₯Ό μ˜€ν”ˆμ†ŒμŠ€λ‘œ κ³΅κ°œν–ˆλ‹€λŠ” λ°œν‘œμž…λ‹ˆλ‹€. Interactive World Models μ „μš©μœΌλ‘œ μ„€κ³„λœ ν¬μŠ€νŠΈνŠΈλ ˆμ΄λ‹ ν”„λ ˆμž„μ›Œν¬μ΄λ©°, μ‚¬μš©μž λ°μ΄ν„°Β·λ³΄μƒΒ·κΈ°λ°˜ λͺ¨λΈλ‘œ μ»€μŠ€ν„°λ§ˆμ΄μ¦ˆ κ°€λŠ₯ν•œ μ˜€ν”ˆ νŠΈλ ˆμ΄λ‹ μ½”λ“œμ™€ 보닀 μ •λ°€ν•œ μ˜€ν”ˆμ†ŒμŠ€ 체크포인트λ₯Ό ν•¨κ»˜ μ œκ³΅ν•œλ‹€κ³  μ„€λͺ…ν•©λ‹ˆλ‹€.

https://x.com/TencentHunyuan/status/2031215778977165508

#worldcompass #reinforcementlearning #rl #opensource #interactiveworldmodels

Tencent HY (@TencentHunyuan) on X

We are open-sourcing WorldCompass, an RL post-training framework specifically designed for Interactive World Models. πŸ› οΈ Open Training Code: Fully customizable for post-training with your own data, rewards, or base models. ⚑ Open-source Checkpoint: More precise

X (formerly Twitter)