AI Coffee Break with Letitia

37 Followers
14 Following
36 Posts
📺 ML Youtuber http://youtube.com/AICoffeeBreak
👩‍🎓 PhD student in Computational Linguistics @ Heidelberg University
YouTubehttp://youtube.com/AICoffeeBreak
Twitterhttps://twitter.com/AICoffeeBreak

LLMs often memorize what they see — even a single phone number can stick in their weights. Google’s VaultGemma changes that: it’s the first open-weight LLM trained from scratch with differential privacy, so rare secrets leave no trace. 👉 In this video, we explain Differential Privacy through VaultGemma — how it works, why it matters, and what it means for trustworthy AI.

🎥 https://youtu.be/UwX5zzjwb_g

We explain diffusion models and flow-matching models side by side to highlight the key differences between them. Flow-Matching models are the new generation of AI image generators that are quickly replacing diffusion models. They take everything diffusion did well, but make it faster, smoother, and deterministic.

🎥 https://youtu.be/firXjwZ_6KI

Ever wondered how Energy-Based Models (EBMs) work and how they differ from normal neural networks?
☕️ We go over EBMs and then dive into the Energy-Based Transformers paper to make LLMs that refine guesses, self-verify, and could adapt compute to problem difficulty.
Works for image and video transformers too!
🎥 https://youtu.be/18Fn2m99X1k

The world’s largest NLP conference with almost 2,000 papers presented, ACL 2025 just took place in Vienna! 🎓✨ Here is a quick snapshot of the event via a short interview with one of the authors whose work caught my attention.

🎥 Watch: https://youtu.be/GBISWggsQOA

How do LLMs pick the next word? They don’t choose words directly: they only output word probabilities. 📊 Greedy decoding, top-k, top-p, min-p are methods that turn these probabilities into actual text.

In this video, we break down each method and show how the same model can sound dull, brilliant, or unhinged – just by changing how it samples.
🎥 Watch here: https://youtu.be/o-_SZ_itxeA

💡 AlphaEvolve is a new AI system that doesn’t just write code, it evolves it. It uses LLMs and evolutionary search to make scientific discoveries.
In this video we explain how AlphaEvolve works and the evolutionary strategies behind it (like MAP-Elites and island-based population methods).
📺 https://youtu.be/Z4uF6cVly8o

Long videos are a nightmare for language models—too many tokens, slow inference. ☠️
We explain STORM ⛈️, a new architecture that improves long video LLMs using Mamba layers and token compression. Reaches better accuracy than GPT-4o on benchmarks and up to 8× more efficiency.

📺 https://youtu.be/uMk3VN4S8TQ

Token-Efficient Long Video Understanding for Multimodal LLMs | Paper explained

YouTube
We all know quantization works at inference time, but researchers successfully trained a 13B LLaMA 2 model using FP4 precision (only 16 values per weight!). 🤯
We break down how it works. If quantization and mixed-precision training sounds mysterious, this’ll clear it up.
📺 https://youtu.be/Ue3AK4mCYYg
4-Bit Training for Billion-Parameter LLMs? Yes, Really.

YouTube
Just say “Wait…” – and your LLM gets smarter?!
We explain how just 1,000 training examples + a tiny trick at inference time = o1-preview level reasoning. No RL, no massive data needed.
🎥 Watch now → https://youtu.be/XuH2QTAC5yI
s1: Simple test-time scaling: Just “wait…” + 1,000 training examples? | PAPER EXPLAINED

YouTube

We explain 🥥COCONUT (Chain of Continuous Thought), a new paper using vectors for CoT instead of words. We break down:

- Why CoT with words might not be optimal.
- How to implement vectors for CoT instead words and make CoT faster.
- What this means for interpretability.

📺 https://youtu.be/mhKC3Avqy2E

COCONUT: Training large language models to reason in a continuous latent space – Paper explained

YouTube