Mastodawn

Mô hình AI tự cải tiến đang trở thành xu hướng. DeepMind, OpenAI và startup mới của Richard Socher đang nghiên cứu khả năng mô hình tự học sau khi đào tạo. Tiềm năng tăng tốc AI nhưng cũng nâng cao rủi ro, yêu cầu minh bạch và khung an toàn mới. #AI #ArtificialIntelligence #ML #MachineLearning #SelfImproving #CôngNghệAI #AnToanAI

https://www.reddit.com/r/singularity/comments/1qo8yr9/models_that_improve_on_their_own_are_ais_next_big/

PUPUWEB Blog Jun 20, 2025

MIT researchers introduce SEAL: Self-Adapting LLMs that continuously improve by generating their own training data through reinforcement learning. AI systems now update their weights autonomously. #MIT #SEAL #AI #MachineLearning #SelfImproving #LLM #Research #ArtificialIntelligence #Tech

N-gated Hacker News Mar 7, 2025

🎓🤖 "LADDER: Self-Improving LLMs" - Because clearly, the world needed an even more convoluted way to say "AI learns stuff by doing stuff." With support from the prestigious "Simons Foundation" and other mysterious "member institutions," this paper promises to elevate how machines do what they already do. Groundbreaking! 🚀
https://arxiv.org/abs/2503.00735 #LADDER #SelfImproving #LLMs #AI #Learning #SimonsFoundation #Groundbreaking #Tech #HackerNews #ngated

LADDER: Self-Improving LLMs Through Recursive Problem Decomposition

We introduce LADDER (Learning through Autonomous Difficulty-Driven Example Recursion), a framework enabling LLMs to autonomously improve their problem-solving capabilities through self-guided learning. By recursively generating and solving progressively simpler variants of complex problems, LADDER enables models to progressively learn through reinforcement learning how to solve harder problems. This self-improvement process is guided by verifiable reward signals, allowing the model to assess its solutions. Unlike prior approaches requiring curated datasets or human feedback, LADDER leverages the model's own capabilities to easier variants of sample questions. We demonstrate LADDER's effectiveness on mathematical integration tasks, where it improves a Llama 3B model's accuracy from 1\% to 82\% on undergraduate-level problems and enables a 7B parameter model to achieve state-of-the-art performance (70\%) on the MIT Integration Bee examination for it's model size. We also introduce TTRL (Test-Time Reinforcement Learning), a method that generates variants of test problems at inference time and applies reinforcement learning to further improve performance. By further creating and solving related problems during testing, TTRL enables the 7B model to achieve a score of 85\%, surpassing o1. These results showcase how strategic self-directed learning can achieve significant capability improvements without relying on architectural scaling or human supervision.

arXiv.org

Hacker News Mar 7, 2025

Ladder: Self-Improving LLMs Through Recursive Problem Decomposition — https://arxiv.org/abs/2503.00735
#HackerNews #Ladder #SelfImproving #LLMs #RecursiveProblemDecomposition #AIResearch #MachineLearning

LADDER: Self-Improving LLMs Through Recursive Problem Decomposition

We introduce LADDER (Learning through Autonomous Difficulty-Driven Example Recursion), a framework enabling LLMs to autonomously improve their problem-solving capabilities through self-guided learning. By recursively generating and solving progressively simpler variants of complex problems, LADDER enables models to progressively learn through reinforcement learning how to solve harder problems. This self-improvement process is guided by verifiable reward signals, allowing the model to assess its solutions. Unlike prior approaches requiring curated datasets or human feedback, LADDER leverages the model's own capabilities to easier variants of sample questions. We demonstrate LADDER's effectiveness on mathematical integration tasks, where it improves a Llama 3B model's accuracy from 1\% to 82\% on undergraduate-level problems and enables a 7B parameter model to achieve state-of-the-art performance (70\%) on the MIT Integration Bee examination for it's model size. We also introduce TTRL (Test-Time Reinforcement Learning), a method that generates variants of test problems at inference time and applies reinforcement learning to further improve performance. By further creating and solving related problems during testing, TTRL enables the 7B model to achieve a score of 85\%, surpassing o1. These results showcase how strategic self-directed learning can achieve significant capability improvements without relying on architectural scaling or human supervision.

arXiv.org

Musbahu Aminu Mar 1, 2025

#selfimproving

Pariah Mouse Jan 15, 2025

First of the year--- I'm working on getting a first real issue done here soon. So keep checking in! I will likely have a patreon or something up in the near future too.

https://www.tumblr.com/shoeboxasylum/771715616599113728/first-of-the-year-im-working-on-getting-a?source=share
#ShoeboxAsylum #gerbil #mouseexperiments #mousedistopia #mouseutopia #originalcharacter #originalcomic #originalcontent #mice #raypunk #scifi #stephanking #NewYear #burnout #burnit #burnitalldown #pariahmouse #selfimproving #motivational #demotivationals #wildfireseason

Universe 14

First of the year--- I'm working on getting a first real issue done here soon. So keep checking in! I will likely have a patreon or something up in the near future too.

Tumblr

Milan Sep 6, 2018

The Future of the Daily Habit Challenge https://peertube.social/videos/watch/8a8c9311-7297-4b4d-9eb5-8b4e9ce16acd

The Future of the Daily Habit Challenge

PeerTube