A new approach replaces gaze annotations with language-driven attention masking, improving robot perception while reducing training overhead. https://hackernoon.com/beyond-reconvla-annotation-free-visual-grounding-via-language-attention-masked-reconstruction #transformermodels
