AI Learns by Watching - Sholto & Trenton on Dwarkesh

#generalization #ai #reinforcementlearning

A VLA with Open-World Generalization

Our latest generalist policy, π0.5, extends π0 and enables open-world generalization. Our new model can control a mobile manipulator to clean up an entirely new kitchen or bedroom.

e509 — Maverick and Marbles

e509 with Michael and Michael - stories and discussion all around #AI, #LLMs, #llamas, generated #Quake, #grokking, #generalization and much more.

https://gamesatwork.biz/2025/04/14/e509-maverick-and-marbles/

e509 — Maverick and Marbles | Games At Work dot Biz

stories and discussion all around AI, LLMs, llamas, generated Quake, grokking, generalization and much more.

Games At Work dot Biz | Play games with us!

e509 — Maverick and Marbles

e509 with Michael and Michael - stories and discussion all around #AI, #LLMs, #llamas, generated #Quake, #grokking, #generalization and much more.

https://gamesatwork.biz/2025/04/14/e509-maverick-and-marbles/

e509 — Maverick and Marbles | Games At Work dot Biz

stories and discussion all around AI, LLMs, llamas, generated Quake, grokking, generalization and much more.

Games At Work dot Biz | Play games with us!
Release 1.0.0 · nf-core/drugresponseeval

What's Changed Important! Template update for nf-core/tools v3.0.1 by @nf-core-bot in #10 Merge branch 'dev' of github.com:nf-core/drugresponseeval into dev by @JudithBernett in #11 Global checkpo...

GitHub

People value us for the value (they believe) we (might) add to them.

Generalizing of course, but it's all transactional. There's no (longer) valuing people for just who they are.

#society #people #life #generalization

Grokking at Edge of Numerical Stability
https://arxiv.org/abs/2501.04697
https://old.reddit.com/r/MachineLearning/comments/1i34keg/grokking_at_the_edge_of_numerical_stability
https://en.wikipedia.org/wiki/Grokking_(machine_learning)

* sudden generalization after prolonged overfitting
* massively overtrained NN can acq. "emergent"/supra performance/unexpected abilities
* unexp./accid. finding
* mechanisms starting to unravel

Grokked Transformers are Implicit Reasoners: Mechanistic Journey to Edge of Generalization
https://arxiv.org/abs/2405.15071
https://news.ycombinator.com/item?id=40495149

#LLM #ML #grokking #NN #emergence #generalization

Grokking at the Edge of Numerical Stability

Grokking, the sudden generalization that occurs after prolonged overfitting, is a surprising phenomenon challenging our understanding of deep learning. Although significant progress has been made in understanding grokking, the reasons behind the delayed generalization and its dependence on regularization remain unclear. In this work, we argue that without regularization, grokking tasks push models to the edge of numerical stability, introducing floating point errors in the Softmax function, which we refer to as Softmax Collapse (SC). We demonstrate that SC prevents grokking and that mitigating SC enables grokking without regularization. Investigating the root cause of SC, we find that beyond the point of overfitting, the gradients strongly align with what we call the naïve loss minimization (NLM) direction. This component of the gradient does not alter the model's predictions but decreases the loss by scaling the logits, typically by scaling the weights along their current direction. We show that this scaling of the logits explains the delay in generalization characteristic of grokking and eventually leads to SC, halting further learning. To validate our hypotheses, we introduce two key contributions that address the challenges in grokking tasks: StableMax, a new activation function that prevents SC and enables grokking without regularization, and $\perp$Grad, a training algorithm that promotes quick generalization in grokking tasks by preventing NLM altogether. These contributions provide new insights into grokking, elucidating its delayed generalization, reliance on regularization, and the effectiveness of existing grokking-inducing methods. Code for this paper is available at https://github.com/LucasPrietoAl/grokking-at-the-edge-of-numerical-stability.

arXiv.org

A post from August 2024 by @grimalkina, boosted by someone on another instance, about why to report demographics in research even when you're not studying those groups. This seems like a great primer for people who have little background in basic #sampling and #generalization (for some reason I can't link/boost from here, so):

https://mastodon.social/@grimalkina/112966685297897685

My 2 cents (already at least partially covered by Dr. Hicks):

1. Your study is never just about your study. Good science is #open and reusable. e.g., maybe your study on tech-enabled healthcare access isn't specifically about LGBTQ+ or Hispanic people, but what are you doing to help a researcher who comes along in 10 years? That information will change what they find and report.

2. Marginalized groups are often minorities, meaning representative probability samples (or --uncomfortable gesture-- convenience samples) for bread-and-butter research frequently have subpopulations too small for reasonable power in correlations, group differences, etc. That's just reality. It's also a big problem for our understanding of #marginalized + #minority groups. Oversampling or targeted studies of those groups are important. It's also important to have a large number of less-targeted studies with relevant information that can be synthesized later (see #1): one study with 1.3% trans participants doesn't tell us much about the trans population, but 20 studies, each of which has 1.3% trans participants, could tell us meaningful things.

3. Representation is important. My belief is that #marginalized+minoritized people need their identities and existence public and constant. In #science, both they and other people consuming the research will benefit from being reminded that they are there, almost always, in our #research.

'Generalization on the Unseen, Logic Reasoning and Degree Curriculum', by Emmanuel Abbe, Samy Bengio, Aryo Lotfi, Kevin Rizk.

http://jmlr.org/papers/v25/24-0220.html

#sparse #learns #generalization

Generalization on the Unseen, Logic Reasoning and Degree Curriculum