One of the more in-depth talk about LLM's problem and world model (JEPA architecture by Yann LeCun

One of the more in-depth talk about LLM's problem and world model (JEPA architecture by Yann LeCun

swyx (@swyx)
@yitayml 팟에서 세계 모델(World Model) 논의를 세 가지 유형으로 정리했습니다: (1) Genie 3 같은 싱글플레이어 3D 비디오 모델, (2) JEPA 같은 잠재 예측 학습, (3) 적대적 추론 및 마음 이론(theory of mind)을 다루는 세계 모델. 해당 세 번째 주제에 관한 첫 논문을 방금 발표했다고 알리고 있습니다.

As I floated in our @yitayml pod, there are 3 kinds of World Model discussions: 1) single player 3D video models like Genie 3, 2) latent learning prediction like JEPA. just published our first article on the third: World models for adversarial reasoning and theory of mind, and
World Model 삼국지: Fei-Fei Li, LeCun, DeepMind가 만드는 세 가지 미래
Fei-Fei Li, Yann LeCun, DeepMind가 World Model이라는 같은 용어로 만드는 전혀 다른 세 가지 AI 미래. 3D 자산, 예측 엔진, 시뮬레이터의 차이를 명확히 설명합니다.LLM-JEPA: Phương pháp đào tạo LLM mới, áp dụng kiến trúc JEPA từ thị giác máy tính, đã ra đời! Nó giúp các Mô hình Ngôn ngữ Lớn vượt trội hơn đáng kể so với đào tạo tiêu chuẩn và chống overfitting hiệu quả, được kiểm chứng trên Llama3, Gemma2... Tuy nhiên, cần lưu ý về 2 siêu tham số bổ sung và chi phí tính toán tăng gấp đôi.
#LLM #JEPA #AI #DeepLearning #MachineLearning #MôHìnhNgônNgữLớn #TríTuệNhânTạo
https://www.reddit.com/r/LocalLLaMA/comments/1o4av71/llmjepa_large_language_models_meet_join
Self-supervised learning, JEPA, world models, and the future of AI [video]
https://www.youtube.com/watch?v=yUmDRxV0krg
#HackerNews #SelfSupervisedLearning #JEPA #WorldModels #FutureOfAI #AIResearch

#ITByte: The #JEPA (Joint-Embedding Predictive Architecture) AI model is a significant development in self-supervised learning, spearheaded by Meta AI and heavily influenced by Yann LeCun, Meta's Chief AI Scientist.
https://knowledgezone.co.in/posts/Joint-Embedding-Predictive-Architecture-68597054a53194f0fea8f83d
Self-Supervised Learning from Images with JEPA
https://arxiv.org/abs/2301.08243
#HackerNews #SelfSupervisedLearning #JEPA #ImageProcessing #MachineLearning #AIResearch #ComputerVision
This paper demonstrates an approach for learning highly semantic image representations without relying on hand-crafted data-augmentations. We introduce the Image-based Joint-Embedding Predictive Architecture (I-JEPA), a non-generative approach for self-supervised learning from images. The idea behind I-JEPA is simple: from a single context block, predict the representations of various target blocks in the same image. A core design choice to guide I-JEPA towards producing semantic representations is the masking strategy; specifically, it is crucial to (a) sample target blocks with sufficiently large scale (semantic), and to (b) use a sufficiently informative (spatially distributed) context block. Empirically, when combined with Vision Transformers, we find I-JEPA to be highly scalable. For instance, we train a ViT-Huge/14 on ImageNet using 16 A100 GPUs in under 72 hours to achieve strong downstream performance across a wide range of tasks, from linear classification to object counting and depth prediction.