月白 累のVRChatライブ「latent」が4月5日に再上映決定! マルチインスタンス対応で入場制限なし
https://www.moguravr.com/geppaku-rui-latent-restream-vrchat-0405/

#moguravr #メタバース #latent

月白 累のVRChatライブ「latent」が4月5日に再上映決定! マルチインスタンス対応で入場制限なし

バーチャルアーティスト月白 累さんの単独ライブ「latent」が、2026年4月5日(日)にVRChat内の特設ワールドで再上映されます。開始時刻は22時、入場は無料です。 本公演は、3月15日にVRChat会場で開催さ […]

MoguLive
Scaling Latent Reasoning via Looped Language Models

Modern LLMs are trained to "think" primarily via explicit text generation, such as chain-of-thought (CoT), which defers reasoning to post-training and under-leverages pre-training data. We present and open-source Ouro, named after the recursive Ouroboros, a family of pre-trained Looped Language Models (LoopLM) that instead build reasoning into the pre-training phase through (i) iterative computation in latent space, (ii) an entropy-regularized objective for learned depth allocation, and (iii) scaling to 7.7T tokens. Ouro 1.4B and 2.6B models enjoy superior performance that match the results of up to 12B SOTA LLMs across a wide range of benchmarks. Through controlled experiments, we show this advantage stems not from increased knowledge capacity, but from superior knowledge manipulation capabilities. We also show that LoopLM yields reasoning traces more aligned with final outputs than explicit CoT. We hope our results show the potential of LoopLM as a novel scaling direction in the reasoning era. Our model is available here: http://ouro-llm.github.io.

arXiv.org

#statstab #404 {latent2likert} simulate Likert response variables from hypothetical latent variables

Thoughts: Most of psych is Likert type data. This R pkg can help simulate effects and check model fit.

#likert #ordinal #r #latent #simulation #data

https://latent2likert.lalovic.io/articles/using_latent2likert

Using latent2likert

latent2likert

In our recent #JournalClub, I presented Genkin et al. (2025), who decode #DecisionMaking in the #PremotorCortex of #macaques as low-dimensional #latent #dynamics shared across #NeuralPopulations. Their generative model links tuning curves, spike-time variability, and stimulus-dependent potential landscapes to a common internal decision variable. I summarized and discussed their findings in this blog post:

📝https://doi.org/10.1038/s41586-025-09199-1
🌍https://www.fabriziomusacchio.com/blog/2025-08-01-decoding_decision_making_in_premotor_cortex/

#CompNeuro #Neuroscience #Cortex

And as an added bonus: Dileep George, one of the authors of the paper, just shared a #JupyterNotebook demo 🐍📔. You can explore the #CSCG model, visualize #PlaceFields, and inspect the learned #latent #graphs 📈 Just try it out, it's great fun 👌

🌍 https://colab.research.google.com/drive/1kgjuoz_Noo7uV87StSbW7T8-IBQmPOLE?usp=sharing#scrollTo=7jnInNH7RUAX

#Hippocampus #CognitiveMaps #CSCG #Neuroscience #ComputationalModeling #CompNeuro

Google Colab

This paper by Raju et al. proposes a unified model – “clone‑structured causal #graphs” (#CSCG) – for #hippocampal #SpatialCoding. It suggests that #SpatialMaps arise from #learning #latent higher‑order sequences rather than representing #EuclideanSpace directly. The model elegantly explains phenomena like #PlaceFields, #SplitterCells, #contextual #remapping, and predicts when #PlaceFieldMapping may mislead.

🌍 https://www.science.org/doi/10.1126/sciadv.adm8470

#Hippocampus #CognitiveMaps #SequenceLearning #Neuroscience

TransMLA: Migrating GQA Models to MLA with Full DeepSeek Compatibility and Speedup

In this paper, we present TransMLA, a framework that seamlessly converts any GQA-based pre-trained model into an MLA-based model. Our approach enables direct compatibility with DeepSeek's codebase, allowing these models to fully leverage DeepSeek-specific optimizations such as vLLM and SGlang. By compressing 93% of the KV cache in LLaMA-2-7B, TransMLA achieves a 10.6x inference speedup at an 8K context length while preserving meaningful output quality. Additionally, the model requires only 6 billion tokens for fine-tuning to regain performance on par with the original across multiple benchmarks. TransMLA offers a practical solution for migrating GQA-based models to the MLA structure. When combined with DeepSeek's advanced features, such as FP8 quantization and Multi-Token Prediction, even greater inference acceleration can be realized.

arXiv.org
Byte Latent Transformer: Patches Scale Better Than Tokens

We introduce the Byte Latent Transformer (BLT), a new byte-level LLM architecture that, for the first time, matches tokenization-based LLM performance at scale with significant improvements in inference efficiency and robustness. BLT encodes bytes into dynamically sized patches, which serve as the primary units of computation. Patches are segmented based on the entropy of the next byte, allocating more compute and model capacity where increased data complexity demands it. We present the first FLOP controlled scaling study of byte-level models up to 8B parameters and 4T training bytes. Our results demonstrate the feasibility of scaling models trained on raw bytes without a fixed vocabulary. Both training and inference efficiency improve due to dynamically selecting long patches when data is predictable, along with qualitative improvements on reasoning and long tail generalization. Overall, for fixed inference costs, BLT shows significantly better scaling than tokenization-based models, by simultaneously growing both patch and model size.

arXiv.org
SciTech Chronicles. . . . . . . . .April 9th, 2025

Vol II No 8 347 links Curated The quality of a democracy depends upon the quality of its voters; an apathetic, uninformed public will only g...