月白 累のVRChatライブ「latent」が4月5日に再上映決定! マルチインスタンス対応で入場制限なし
https://www.moguravr.com/geppaku-rui-latent-restream-vrchat-0405/
月白 累のVRChatライブ「latent」が4月5日に再上映決定! マルチインスタンス対応で入場制限なし
https://www.moguravr.com/geppaku-rui-latent-restream-vrchat-0405/
Scaling Latent Reasoning via Looped Language Models
https://arxiv.org/abs/2510.25741
#HackerNews #Scaling #Latent #Reasoning #Looped #Language #Models #AI #Research

Modern LLMs are trained to "think" primarily via explicit text generation, such as chain-of-thought (CoT), which defers reasoning to post-training and under-leverages pre-training data. We present and open-source Ouro, named after the recursive Ouroboros, a family of pre-trained Looped Language Models (LoopLM) that instead build reasoning into the pre-training phase through (i) iterative computation in latent space, (ii) an entropy-regularized objective for learned depth allocation, and (iii) scaling to 7.7T tokens. Ouro 1.4B and 2.6B models enjoy superior performance that match the results of up to 12B SOTA LLMs across a wide range of benchmarks. Through controlled experiments, we show this advantage stems not from increased knowledge capacity, but from superior knowledge manipulation capabilities. We also show that LoopLM yields reasoning traces more aligned with final outputs than explicit CoT. We hope our results show the potential of LoopLM as a novel scaling direction in the reasoning era. Our model is available here: http://ouro-llm.github.io.
#statstab #404 {latent2likert} simulate Likert response variables from hypothetical latent variables
Thoughts: Most of psych is Likert type data. This R pkg can help simulate effects and check model fit.
#likert #ordinal #r #latent #simulation #data
https://latent2likert.lalovic.io/articles/using_latent2likert
In our recent #JournalClub, I presented Genkin et al. (2025), who decode #DecisionMaking in the #PremotorCortex of #macaques as low-dimensional #latent #dynamics shared across #NeuralPopulations. Their generative model links tuning curves, spike-time variability, and stimulus-dependent potential landscapes to a common internal decision variable. I summarized and discussed their findings in this blog post:
📝https://doi.org/10.1038/s41586-025-09199-1
🌍https://www.fabriziomusacchio.com/blog/2025-08-01-decoding_decision_making_in_premotor_cortex/
And as an added bonus: Dileep George, one of the authors of the paper, just shared a #JupyterNotebook demo 🐍📔. You can explore the #CSCG model, visualize #PlaceFields, and inspect the learned #latent #graphs 📈 Just try it out, it's great fun 👌
#Hippocampus #CognitiveMaps #CSCG #Neuroscience #ComputationalModeling #CompNeuro
This paper by Raju et al. proposes a unified model – “clone‑structured causal #graphs” (#CSCG) – for #hippocampal #SpatialCoding. It suggests that #SpatialMaps arise from #learning #latent higher‑order sequences rather than representing #EuclideanSpace directly. The model elegantly explains phenomena like #PlaceFields, #SplitterCells, #contextual #remapping, and predicts when #PlaceFieldMapping may mislead.
TransMLA: Multi-Head Latent Attention Is All You Need
https://arxiv.org/abs/2502.07864
#HackerNews #TransMLA #Multi-Head #Latent #Attention #MachineLearning #AIResearch #Arxiv
In this paper, we present TransMLA, a framework that seamlessly converts any GQA-based pre-trained model into an MLA-based model. Our approach enables direct compatibility with DeepSeek's codebase, allowing these models to fully leverage DeepSeek-specific optimizations such as vLLM and SGlang. By compressing 93% of the KV cache in LLaMA-2-7B, TransMLA achieves a 10.6x inference speedup at an 8K context length while preserving meaningful output quality. Additionally, the model requires only 6 billion tokens for fine-tuning to regain performance on par with the original across multiple benchmarks. TransMLA offers a practical solution for migrating GQA-based models to the MLA structure. When combined with DeepSeek's advanced features, such as FP8 quantization and Multi-Token Prediction, even greater inference acceleration can be realized.
Byte Latent Transformer: Patches Scale Better Than Tokens
https://arxiv.org/abs/2412.09871
#HackerNews #Byte #Latent #Transformer #Patches #Tokens #AI #Research #Machine #Learning
We introduce the Byte Latent Transformer (BLT), a new byte-level LLM architecture that, for the first time, matches tokenization-based LLM performance at scale with significant improvements in inference efficiency and robustness. BLT encodes bytes into dynamically sized patches, which serve as the primary units of computation. Patches are segmented based on the entropy of the next byte, allocating more compute and model capacity where increased data complexity demands it. We present the first FLOP controlled scaling study of byte-level models up to 8B parameters and 4T training bytes. Our results demonstrate the feasibility of scaling models trained on raw bytes without a fixed vocabulary. Both training and inference efficiency improve due to dynamically selecting long patches when data is predictable, along with qualitative improvements on reasoning and long tail generalization. Overall, for fixed inference costs, BLT shows significantly better scaling than tokenization-based models, by simultaneously growing both patch and model size.
SciTech Chronicles. . . . . . . . .April 9th, 2025
#virus #vector-based #T-cell #respiratory #moisture #temperature #instability #latent-heat #transient #temporal #singularities #acceleration #fuel-cell #PEM #fincantieri #decarbonization #1420MHz #redshift #88-108MHz #CubeSats