How can we train data-efficient robots that can respond to open-ended queries like “warm up my lunch” or “find a blue book”?
Introducing CLIP-Field, a semantic neural field trained w/ NO human labels & only w/ web-data pretrained detectors, VLMs, and LLMs http://mahis.life/clip-fields https://t.co/CffqCZfRuW
CLIP-Fields: Weakly Supervised Semantic Fields for Robotic Memory

Teaching robots in the real world to respond to natural language queries with zero human labels — using pretrained large language models (LLMs), visual language models (VLMs), and neural fields.

CLIP-Fields: Weakly Supervised Semantic Fields for Robotic Memory
At the core of CLIP-Field lies a neural field that maps real world coordinates to the semantic representation spaces underlying pretrained models like CLIP and Sentence-BERT. This mapping enables our model to respond to open-ended semantic or visual queries. https://t.co/sLxatUxBpP
Mahi Shafiullah on Twitter

“At the core of CLIP-Field lies a neural field that maps real world coordinates to the semantic representation spaces underlying pretrained models like CLIP and Sentence-BERT. This mapping enables our model to respond to open-ended semantic or visual queries.”

Twitter
For real world exps, we collect RGB-D data using an iPhone 13 Pro and pre-process them using open-label detection/segmentation models like Detic and LSeg.
We then convert the data to world coordinates and semantic/visual reps using Sentence-BERT and CLIP on the labels and bboxes. https://t.co/9POmdCnyp0
Mahi Shafiullah on Twitter

“For real world exps, we collect RGB-D data using an iPhone 13 Pro and pre-process them using open-label detection/segmentation models like Detic and LSeg. We then convert the data to world coordinates and semantic/visual reps using Sentence-BERT and CLIP on the labels and bboxes.”

Twitter
We can train a CLIP field from scratch under an hour, including automated labeling, thanks to advances in NeRF literature such as instant-NGP. Our trained model can then be used on a robot to find "blue book with a house on the cover" or a place to "throw out my trash". https://t.co/VuYoMPdfHx
Mahi Shafiullah on Twitter

“We can train a CLIP field from scratch under an hour, including automated labeling, thanks to advances in NeRF literature such as instant-NGP. Our trained model can then be used on a robot to find "blue book with a house on the cover" or a place to "throw out my trash".”

Twitter

Thanks to my advisors and collaborators @cpaxton @lerrel @soumith and Arthur Szlam, and finally Meta AI for an amazing internship!

Paper: http://arxiv.org/abs/2210.05663
More video/demos: http://mahis.life/clip-fields/
Code and interactive tutorials: http://github.com/notmahi/clip-fields

CLIP-Fields: Weakly Supervised Semantic Fields for Robotic Memory

We propose CLIP-Fields, an implicit scene model that can be used for a variety of tasks, such as segmentation, instance identification, semantic search over space, and view localization. CLIP-Fields learns a mapping from spatial locations to semantic embedding vectors. Importantly, we show that this mapping can be trained with supervision coming only from web-image and web-text trained models such as CLIP, Detic, and Sentence-BERT; and thus uses no direct human supervision. When compared to baselines like Mask-RCNN, our method outperforms on few-shot instance identification or semantic segmentation on the HM3D dataset with only a fraction of the examples. Finally, we show that using CLIP-Fields as a scene memory, robots can perform semantic navigation in real-world environments. Our code and demonstration videos are available here: https://mahis.life/clip-fields

arXiv.org

#Introduction I'm Mahi, third year PhD at NYU and visiting researcher at FAIR working on the intersection of #robotics and #machinelearning ! Since CLIP-Fields recently got outstanding paper award at #CoRL22 LangRob workshop, felt right to start by importing this post over from twitter.

Also check out https://mahis.life for my other research projects!

(Coming from hashtags? Click on this post to see the full thread)

Nur Muhammad

Trying to teach robots to do all my chores at @NYU-robot-learning @facebookresearch