Vibe coding but for the physical world — how can agentic systems be enabled to work long duration tasks involving prototyping, experiments, and slow and expensive physical activities?

Where robotic agentic tasks aren't like "fold this laundry and put it away", but like "build a factory for making drones".

We need tooling which is qualitatively different from classical software engineering where we typically have an abundance of ways to make rapid iteration, fast automatic tests, everything working within relatively well-defined and repeatable world of deterministic, fully knowable bits, machine speed.

We need principled approaches to make intelligent entities not only embodied, but able to transform their environment with purpose, within very complex constraints, social networks and collaborations.

Instead of pure software engineering approaches, we need agents to first mitigate all the aspects which make the physical world hard:
- Acquire all the relevant experience and knowledge about the topic to not repeat the mistakes someone has already done before.
- Make fast simulations for the basic things for testing prototypes, and a bit heavier, targeted simulations to weed out more subtle issues before going into the more costly physical world.
- Build or demonstrate minimal prototypes in the real world to prove the feasibility of the riskiest aspects before investing in making them full scale. This includes things like inspecting samples of components and materials where appropriate.

Then the system will be left with minimal real-world problem volume which needs to be crunched in the matter:
- Logistics, getting the correct things in the correct place at the correct time. Where to get the materials, are new tools needed, how to arrange all the money, materials, energy, collaborations and supply chains.
- Making measurements, observations and tests in the real world, things like what the soil is like on the site.
- Actual trial and error in the physical space, handling unexpected situations and surprises, going back to the drawing board if necessary.

All in all it's a very complex problem space of making generalist, agentic AI to work the physical world in a purposeful and effective manner, and not much of it is solved yet. Making robots move and manipulate is only the most immediate surface of the problem space. Embodiment is only the first step. And it's not all solveable in the space of digital documents, the AIs need to be material world native to be able to function well in the real world, and not in the world of abstractions.

#PhysicalFoundationModels #robotics #AI #AGI

One thing to understand about physical foundation models or robotic foundation models is in-context learning.

You should aim to frame the problem and the data in a fashion where the model can learn to control the embodiment in-context, rather than training it without a possibility to calibrate and discover where it is in the start of the session.

Otherwise you won't get truly universal models, but models which constantly hedge their bets and are forced to make their control signal not only generalist, but generalist across all training worlds and embodiments *at the same time*.

This means that you'll be stuck in a frame where you will need a control adapter layer separately trained per embodiment, because the foundation model is incapable of discovering in-context what it inhabits, so its outputs are by necessity the kind that should work somewhat ok for all possible worlds.

The model also becomes unable to learn embodiment-specific control policies without hacks.

I believe the fact that people don't realize they need to consider in-context learning for these foundation models for embodiment calibration is a root of many practical problems down the line.

#PhysicalFoundationModels #UniversalEmbodiment #robots #FoundationModels