AI Learns by Watching - Sholto & Trenton on Dwarkesh
AI Learns by Watching - Sholto & Trenton on Dwarkesh
π0.5: A VLA with open-world generalization
#HackerNews #π0.5 #VLA #openworld #generalization #machinelearning #AI
e509 — Maverick and Marbles
e509 with Michael and Michael – stories and discussion all around #AI, #LLMs, #llamas, generated #Quake, #grokking, #generalization and much more.
https://media.blubrry.com/gamesatwork/op3.dev/e,pg=6e00562f-0386-5985-9c2c-26822923720d/gamesatwork.biz/wp-content/uploads/2025/04/E509.mp3Podcast: Play in new window | Download (Duration: 32:10 — 44.8MB) | Embed
Subscribe: Apple Podcasts | Spotify | Amazon Music | Android | Podcast Index | Youtube Music | RSS | More
Share this:
https://gamesatwork.biz/2025/04/14/e509-maverick-and-marbles/
e509 — Maverick and Marbles
e509 with Michael and Michael - stories and discussion all around #AI, #LLMs, #llamas, generated #Quake, #grokking, #generalization and much more.
https://gamesatwork.biz/2025/04/14/e509-maverick-and-marbles/
e509 — Maverick and Marbles
e509 with Michael and Michael - stories and discussion all around #AI, #LLMs, #llamas, generated #Quake, #grokking, #generalization and much more.
https://gamesatwork.biz/2025/04/14/e509-maverick-and-marbles/
Pipeline release! nf-core/drugresponseeval v1.0.0 - 1.0.0!
Please see the changelog: https://github.com/nf-core/drugresponseeval/releases/tag/1.0.0
#celllines #crossvalidation #deeplearning #drugresponse #drugresponseprediction #drugs #fairprinciples #generalization #hyperparametertuning #machinelearning #randomizationtests #robustnessassessment #training #nfcore #openscience #nextflow #bioinformatics
People value us for the value (they believe) we (might) add to them.
Generalizing of course, but it's all transactional. There's no (longer) valuing people for just who they are.
Grokking at Edge of Numerical Stability
https://arxiv.org/abs/2501.04697
https://old.reddit.com/r/MachineLearning/comments/1i34keg/grokking_at_the_edge_of_numerical_stability
https://en.wikipedia.org/wiki/Grokking_(machine_learning)
* sudden generalization after prolonged overfitting
* massively overtrained NN can acq. "emergent"/supra performance/unexpected abilities
* unexp./accid. finding
* mechanisms starting to unravel
Grokked Transformers are Implicit Reasoners: Mechanistic Journey to Edge of Generalization
https://arxiv.org/abs/2405.15071
https://news.ycombinator.com/item?id=40495149
Grokking, the sudden generalization that occurs after prolonged overfitting, is a surprising phenomenon challenging our understanding of deep learning. Although significant progress has been made in understanding grokking, the reasons behind the delayed generalization and its dependence on regularization remain unclear. In this work, we argue that without regularization, grokking tasks push models to the edge of numerical stability, introducing floating point errors in the Softmax function, which we refer to as Softmax Collapse (SC). We demonstrate that SC prevents grokking and that mitigating SC enables grokking without regularization. Investigating the root cause of SC, we find that beyond the point of overfitting, the gradients strongly align with what we call the naïve loss minimization (NLM) direction. This component of the gradient does not alter the model's predictions but decreases the loss by scaling the logits, typically by scaling the weights along their current direction. We show that this scaling of the logits explains the delay in generalization characteristic of grokking and eventually leads to SC, halting further learning. To validate our hypotheses, we introduce two key contributions that address the challenges in grokking tasks: StableMax, a new activation function that prevents SC and enables grokking without regularization, and $\perp$Grad, a training algorithm that promotes quick generalization in grokking tasks by preventing NLM altogether. These contributions provide new insights into grokking, elucidating its delayed generalization, reliance on regularization, and the effectiveness of existing grokking-inducing methods. Code for this paper is available at https://github.com/LucasPrietoAl/grokking-at-the-edge-of-numerical-stability.
A post from August 2024 by @grimalkina, boosted by someone on another instance, about why to report demographics in research even when you're not studying those groups. This seems like a great primer for people who have little background in basic #sampling and #generalization (for some reason I can't link/boost from here, so):
https://mastodon.social/@grimalkina/112966685297897685
My 2 cents (already at least partially covered by Dr. Hicks):
1. Your study is never just about your study. Good science is #open and reusable. e.g., maybe your study on tech-enabled healthcare access isn't specifically about LGBTQ+ or Hispanic people, but what are you doing to help a researcher who comes along in 10 years? That information will change what they find and report.
2. Marginalized groups are often minorities, meaning representative probability samples (or --uncomfortable gesture-- convenience samples) for bread-and-butter research frequently have subpopulations too small for reasonable power in correlations, group differences, etc. That's just reality. It's also a big problem for our understanding of #marginalized + #minority groups. Oversampling or targeted studies of those groups are important. It's also important to have a large number of less-targeted studies with relevant information that can be synthesized later (see #1): one study with 1.3% trans participants doesn't tell us much about the trans population, but 20 studies, each of which has 1.3% trans participants, could tell us meaningful things.
3. Representation is important. My belief is that #marginalized+minoritized people need their identities and existence public and constant. In #science, both they and other people consuming the research will benefit from being reminded that they are there, almost always, in our #research.
'Generalization on the Unseen, Logic Reasoning and Degree Curriculum', by Emmanuel Abbe, Samy Bengio, Aryo Lotfi, Kevin Rizk.
http://jmlr.org/papers/v25/24-0220.html
#sparse #learns #generalization