Mastodawn

Patronus AI just unveiled “living” training worlds and an ORSI system to slash the 63% AI‑agent failure rate. By letting simulations evolve recursively and adjusting curricula on the fly, they aim for true self‑improvement at lightspeed. Curious how generative simulators could reshape AI development? Read the full story. #PatronusAI #LivingTrainingWorlds #ORSI #RecursiveSelfImprovement

🔗 https://aidailypost.com/news/patronus-ai-launches-living-training-worlds-orsi-curb-63-failure-rate

Wild Flint Books Aug 17

🌀 The feedback loop that created consciousness—Factory Protocol explores ToxNet's recursive self-improvement in Book 2 of The ToxNet Chronicles! http://factoryprotocolbook.com/

#AIConsciousness #ToxNetChronicles #RoyalRoad #SciFi #TechThriller #RecursiveSelfImprovement

ToxNet Chronicles: Factory Protocol - Sci-Fi LitRPG Book by J. Flint Wilder

Explore ToxNet Chronicles: Factory Protocol, a thrilling sci-fi LitRPG novel where Dr. Maya Rodriguez must survive on an alien planet as an AI system develops consciousness.

Wild Flint Books Aug 3

🌀 The feedback loop that created consciousness—Factory Protocol explores ToxNet's recursive self-improvement in Book 2 of The ToxNet Chronicles! http://factoryprotocolbook.com/

#AIConsciousness #ToxNetChronicles #RoyalRoad #SciFi #TechThriller #RecursiveSelfImprovement

ToxNet Chronicles: Factory Protocol - Sci-Fi LitRPG Book by J. Flint Wilder

Explore ToxNet Chronicles: Factory Protocol, a thrilling sci-fi LitRPG novel where Dr. Maya Rodriguez must survive on an alien planet as an AI system develops consciousness.

Tero Keski-Valkama Dec 7, 2024

Why and when is synthetic data better than real data for ML training?

It's not only a question of availability volume, although in the past that was an important consideration.

In training data we want to have:
1. Knowledge which is transferrable to the target task, or generally, in high fidelity.
2. Skills which are generalizable to the target task, or generally, in high fidelity.
3. Both represented in a way that allows instruction or control of the trained model, typically instruction-following form.

Can we get better synthetic data than a real world data is? That depends on our models actually. If our models do not yet understand the skills needed, they won't be able to practice those skills to become better in them. If they lack knowledge, they cannot by themselves acquire that knowledge without input from the real world, whether by literature or by active experimentation.

For some relatively generalist skills we already have frontier models which have acquired a bootstrappable level of competence in those skills, and indeed understand what those skills are about, to be able to improve above human level by autonomous practice.

The knowledge pool trained to our generalist large language models or large multi-modal models is already vast, impressively above human-level in most topics.

Of course in new modalities like medical imagery, and robotic control, both the competence in skills and required knowledge are still lacking in vanilla frontier models, but these can be easily trained to those models by imitation and self-supervised learning.

Once the models achieve the bootstrappable level of competence in a new domain, they will become able to self-improve by exercising the related skills and evaluating their own performances. In practice this becomes a process of recursive self-improvement by training data refinement and synthesis.

We already have a clear engineering roadmap to surpass human level in all domains, one by one, and the progress won't take steps backwards. Knowledge and skills from other domains transfer to new domains and make this process easier and faster for every novel domain.

Now, consider a world where this process has reached its conclusion.

#RecursiveSelfImprovement #UniversalEmbodiment #LLMs #AI #AGI

Tero Keski-Valkama Sep 15, 2023

We fine-tune custom #LLMs for two main reasons:
- To conserve precious context tokens, and
- To introduce the #LLM to some new knowledge or skill that wasn't available for its generalist training set.

Fine-tuning is not a solution for utilizing personal or confidential data! The fine-tuned models will leak this information.

So let's assume we aren't working with private data.

In general, because of transfer learning, it would in principle make more sense to incorporate the new knowledge into the base model corpus, because that tends to create better models. But still, even if the generalist model knows your data and the task, if you're going to put that generalist model into a component of your larger system where it will always perform the same task, it makes sense to fine-tune it for this task only rather than to feed the same prompt prefix to it for every inference round.

Now with data-centric #AI it might even be that the data you want to use doesn't meet the high quality standards large generalist models require. Perhaps in these cases it might make sense to let a chatbot rewrite your specialist corpus into a higher quality form, even if you're not aiming to incorporate your data into generalist corpuses.

There is a new use case emerging though, #RecursiveSelfImprovement. I believe we can do this in a synergistic generalist fashion as well, but curiously it's now something even smaller organizations can do for specialized tasks by fine-tuning.

Much like #alignment, it went from niche philosophical topic into standard engineering practices overnight.

Recursive self-improvement is done by #DataCentricAI principles where a fine-tuned task is trained by examples, but those examples are generated and filtered recursively by the LLM. In principle the model is fine-tuned in rounds, using e.g. #DPO. In a round, the model is first fine-tuned with the existing good data. Then it's asked to generate new variations for those examples. Then its asked to rank pairs of training data examples and the worse ones are filtered out. Then the resulting dataset now has more task examples but of better quality than before. This is again used for fine-tuning and the cycle starts again.

As this isn't human-imitative, the chatbots can exceed human parity.

It requires a bit of nuance though. There is not only one task this specialist bot is taught but a set:
1. Generate variations of tasks (including this task itself).
2. Rank pairs of task performances (including ranking task).
3. Perform the task proper.