🧵 Understanding Word2Vec & Contrastive Training
How does a model learn meaning from text without any labels? Let’s break down how Word2Vec taught machines to “understand” language with contrastive learning.

👇 (1/10)

1. What is Word2Vec?
It’s an algorithm that learns embeddings — vector representations of words — so that similar words live close together in a vector space.

(2/10)

2. What’s the goal?
Train a model to predict whether two words commonly appear near each other in real-world text.

✅ Output 1 if yes
❌ Output 0 if no

(3/10)

3. How are training examples built?
A sliding window scans sentences. The center word is paired with surrounding words.

E.g., in “make a machine”, “make” is center, “a” and “machine” are neighbors.

(4/10)

4. The Skip-Gram Model
Given a center word, predict its context.
Training pairs like (“make”, “machine”) help the model learn contextual relevance.

(5/10)

5. The Problem: Only positive examples = model always says 1.

🧪 Negative Sampling
Pair the center word with random, unrelated words. Train it to predict 0 for those.

This forms the core of contrastive learning.

(6/10)

6. Embedding Matrix
Each word starts as a random vector. As training progresses, vectors are updated.

Similar words get closer in the vector space. Unrelated ones are pushed apart.

(7/10)

7. What emerges?
Embeddings that reflect meaning.

“king” - “man” + “woman” ≈ “queen”
This isn’t magic — it’s geometry grounded in context.

(8/10)

8. Why it matters today
This contrastive idea lives on:
• Sentence embeddings
• Retrieval-Augmented Generation (RAG)
• Multimodal models like CLIP (match image ↔ caption)

(9/10)

9. TL;DR
Word2Vec was more than just a way to embed words.
It showed us that contrastive learning works — and it’s now everywhere in LLMs and beyond.

#LLM #AI #Embeddings #NLP #MachineLearning #Word2Vec #ContrastiveLearning