Mô hình AI tự cải tiến đang trở thành xu hướng. DeepMind, OpenAI và startup mới của Richard Socher đang nghiên cứu khả năng mô hình tự học sau khi đào tạo. Tiềm năng tăng tốc AI nhưng cũng nâng cao rủi ro, yêu cầu minh bạch và khung an toàn mới. #AI #ArtificialIntelligence #ML #MachineLearning #SelfImproving #CôngNghệAI #AnToanAI
https://www.reddit.com/r/singularity/comments/1qo8yr9/models_that_improve_on_their_own_are_ais_next_big/
MIT researchers introduce SEAL: Self-Adapting LLMs that continuously improve by generating their own training data through reinforcement learning. AI systems now update their weights autonomously.
#MIT #SEAL #AI #MachineLearning #SelfImproving #LLM #Research #ArtificialIntelligence #Tech🎓🤖 "LADDER: Self-Improving LLMs" - Because clearly, the world needed an even more convoluted way to say "AI learns stuff by doing stuff." With support from the prestigious "Simons Foundation" and other mysterious "member institutions," this paper promises to elevate how machines do what they already do. Groundbreaking! 🚀
https://arxiv.org/abs/2503.00735 #LADDER #SelfImproving #LLMs #AI #Learning #SimonsFoundation #Groundbreaking #Tech #HackerNews #ngated
LADDER: Self-Improving LLMs Through Recursive Problem Decomposition
We introduce LADDER (Learning through Autonomous Difficulty-Driven Example Recursion), a framework enabling LLMs to autonomously improve their problem-solving capabilities through self-guided learning. By recursively generating and solving progressively simpler variants of complex problems, LADDER enables models to progressively learn through reinforcement learning how to solve harder problems. This self-improvement process is guided by verifiable reward signals, allowing the model to assess its solutions. Unlike prior approaches requiring curated datasets or human feedback, LADDER leverages the model's own capabilities to easier variants of sample questions. We demonstrate LADDER's effectiveness on mathematical integration tasks, where it improves a Llama 3B model's accuracy from 1\% to 82\% on undergraduate-level problems and enables a 7B parameter model to achieve state-of-the-art performance (70\%) on the MIT Integration Bee examination for it's model size. We also introduce TTRL (Test-Time Reinforcement Learning), a method that generates variants of test problems at inference time and applies reinforcement learning to further improve performance. By further creating and solving related problems during testing, TTRL enables the 7B model to achieve a score of 85\%, surpassing o1. These results showcase how strategic self-directed learning can achieve significant capability improvements without relying on architectural scaling or human supervision.
arXiv.org
LADDER: Self-Improving LLMs Through Recursive Problem Decomposition
We introduce LADDER (Learning through Autonomous Difficulty-Driven Example Recursion), a framework enabling LLMs to autonomously improve their problem-solving capabilities through self-guided learning. By recursively generating and solving progressively simpler variants of complex problems, LADDER enables models to progressively learn through reinforcement learning how to solve harder problems. This self-improvement process is guided by verifiable reward signals, allowing the model to assess its solutions. Unlike prior approaches requiring curated datasets or human feedback, LADDER leverages the model's own capabilities to easier variants of sample questions. We demonstrate LADDER's effectiveness on mathematical integration tasks, where it improves a Llama 3B model's accuracy from 1\% to 82\% on undergraduate-level problems and enables a 7B parameter model to achieve state-of-the-art performance (70\%) on the MIT Integration Bee examination for it's model size. We also introduce TTRL (Test-Time Reinforcement Learning), a method that generates variants of test problems at inference time and applies reinforcement learning to further improve performance. By further creating and solving related problems during testing, TTRL enables the 7B model to achieve a score of 85\%, surpassing o1. These results showcase how strategic self-directed learning can achieve significant capability improvements without relying on architectural scaling or human supervision.
arXiv.org
Universe 14
First of the year--- I'm working on getting a first real issue done here soon. So keep checking in! I will likely have a patreon or something up in the near future too.
TumblrThe Future of the Daily Habit Challenge
PeerTube