Chris Paxton

298 Followers
103 Following
44 Posts

Robotics research scientist at Meta AI/FAIR, formerly NVIDIA. Making robots that can solve hard problems with people. All views my own.

he/him

Websitehttps://cpaxton.github.io/
Scholarhttps://scholar.google.com/citations?user=I1mOQpAAAAAJ&hl=en
Twitterhttps://twitter.com/chris_j_paxton
People I respect are speculating that OpenAI is pushing back their open-source release due to Kimi K2. It does strike me that this is what Llama 4 was supposed to be; a massive, impressive open-source MoE model that can form the basis for a new generation of agentic AI applications
Grok at 4 on aider polyglot. It's resoundingly clear there's no "best model" any more, just a best model for you and your use case
So all it took was about 56 data points per parameter and you can train grandmaster-level chess w/o search. Shows what a force-multiplier search is when it comes to data efficiency, but also how given unlimited data you can solve basically anything with our current architectures
Instead we use a diffusion model to generate potential object transformations, refine them, and finally train a "discriminator" model to classify which ones look like correct scenes.

We see that just training a transformer model on the data runs into some serious issues - in particular, it often is just slightly off.

Take a look at this example of a generated place setting from point clouds - it's very subtly off, which makes it invalid. Humans can handle this sort of precision, even with plates we haven't seen, but our existing methods based on learning will often struggle.

Combine multimodal transformers with diffusion models to build complex physically realistic structures in the real world! Excited to share our newest work, StructDiffusion:
- paper: https://arxiv.org/abs/2211.04604
- website: http://weiyuliu.com/StructDiffusion/

We train on simulated data + templated language to create structures from 4 different broad classes: lines, circles, table settings, and towers. The challenge is making sure that we can actually place these without collisions when we don't know objects.

StructDiffusion: Object-Centric Diffusion for Semantic Rearrangement of Novel Objects

Robots operating in human environments must be able to rearrange objects into semantically-meaningful configurations, even if these objects are previously unseen. In this work, we focus on the problem of building physically-valid structures without step-by-step instructions. We propose StructDiffusion, which combines a diffusion model and an object-centric transformer to construct structures out of a single RGB-D image based on high-level language goals, such as "set the table." Our method shows how diffusion models can be used for complex multi-step 3D planning tasks. StructDiffusion improves success rate on assembling physically-valid structures out of unseen objects by on average 16% over an existing multi-modal transformer model, while allowing us to use one multi-task model to produce a wider range of different structures. We show experiments on held-out objects in both simulation and on real-world rearrangement tasks. For videos and additional results, check out our website: http://weiyuliu.com/StructDiffusion/.

arXiv.org

Use multi-modal transformers for robot task and motion planning: https://arxiv.org/pdf/2211.01576.pdf

cool work from NVIDIA

I love seeing cool papers like this one from Shuo Cheng and Danfei Xu, which uses ideas from classical, symbolic task and motion planning to handle really hard #robotics problems using #deeplearning, using it to guide reinforcement learning of skills #RL

paper: https://arxiv.org/pdf/2210.12631.pdf

New micro-blogging social media site, long overdue new cover profile.

Instead of a grainy picture of Singapore I figured I'd generate an optimistic solarpunk city of the future with #StableDiffusion

Really neat work by Microsoft research - PACT: Perception-Action Causal Transformer for Autoregressive Robotics Pretraining.

Build robot "foundation models" to do localization and mapping. I strongly believe this is what the future of these large models for robots needs to look like.

More at the MSR project site: https://www.microsoft.com/en-us/research/group/autonomous-systems-group-robotics/articles/perception-action-causal-transformer-for-autoregressive-robotics-pretraining/

PACT: Perception-Action Causal Transformer for Autoregressive Robotics Pretraining - Microsoft Research

Our method uses common robotics representation trained using self-supervised learning that can be fine-tuned to multiple downstream tasks.

Microsoft Research