apolinario (@multimodalart)
BitDance라는 이름의 14B 파라미터 자동회귀(autoregressive) 이미지 생성 모델이 공개되었다는 공지입니다. 이 모델은 코드북이 아닌 'bits' 단위로 자동회귀하며, 14B 규모 치고 빠르게 동작한다고 설명합니다. Hugging Face 스페이스에서 직접 체험해볼 수 있다는 링크를 함께 제공합니다.
apolinario (@multimodalart)
BitDance라는 이름의 14B 파라미터 자동회귀(autoregressive) 이미지 생성 모델이 공개되었다는 공지입니다. 이 모델은 코드북이 아닌 'bits' 단위로 자동회귀하며, 14B 규모 치고 빠르게 동작한다고 설명합니다. Hugging Face 스페이스에서 직접 체험해볼 수 있다는 링크를 함께 제공합니다.
Hầu hết LLM như GPT, Claude, Gemini dùng mô hình tự hồi quy: tạo token từng cái → gây độ trễ, chi phí cao. Mô hình ngôn ngữ diffusion bắt đầu với câu trả lời nhiễu và tinh chỉnh toàn bộ chuỗi trong vài bước song song, giảm latency 5‑10× và chi phí. Dù khó đào tạo và cần hạ tầng mới, nhưng rất hứa hẹn cho các ứng dụng thời gian thực (code autocomplete, trợ lý trong sản phẩm). #LLM #AI #Diffusion #Autoregressive #AIVietnam #TríTuệNhânTạo
The efficiency of large language models (LLMs) is fundamentally limited by their sequential, token-by-token generation process. We argue that overcoming this bottleneck requires a new design axis for LLM scaling: increasing the semantic bandwidth of each generative step. To this end, we introduce Continuous Autoregressive Language Models (CALM), a paradigm shift from discrete next-token prediction to continuous next-vector prediction. CALM uses a high-fidelity autoencoder to compress a chunk of K tokens into a single continuous vector, from which the original tokens can be reconstructed with over 99.9\% accuracy. This allows us to model language as a sequence of continuous vectors instead of discrete tokens, which reduces the number of generative steps by a factor of K. The paradigm shift necessitates a new modeling toolkit; therefore, we develop a comprehensive likelihood-free framework that enables robust training, evaluation, and controllable sampling in the continuous domain. Experiments show that CALM significantly improves the performance-compute trade-off, achieving the performance of strong discrete baselines at a significantly lower computational cost. More importantly, these findings establish next-vector prediction as a powerful and scalable pathway towards ultra-efficient language models. Code: https://github.com/shaochenze/calm. Project: https://shaochenze.github.io/blog/2025/CALM.
Continuous Autoregressive Language Models
https://arxiv.org/abs/2510.27688
#HackerNews #Continuous #Autoregressive #Language #Models #NaturalLanguageProcessing #AI #Research #MachineLearning #TransformerModels
The efficiency of large language models (LLMs) is fundamentally limited by their sequential, token-by-token generation process. We argue that overcoming this bottleneck requires a new design axis for LLM scaling: increasing the semantic bandwidth of each generative step. To this end, we introduce Continuous Autoregressive Language Models (CALM), a paradigm shift from discrete next-token prediction to continuous next-vector prediction. CALM uses a high-fidelity autoencoder to compress a chunk of K tokens into a single continuous vector, from which the original tokens can be reconstructed with over 99.9\% accuracy. This allows us to model language as a sequence of continuous vectors instead of discrete tokens, which reduces the number of generative steps by a factor of K. The paradigm shift necessitates a new modeling toolkit; therefore, we develop a comprehensive likelihood-free framework that enables robust training, evaluation, and controllable sampling in the continuous domain. Experiments show that CALM significantly improves the performance-compute trade-off, achieving the performance of strong discrete baselines at a significantly lower computational cost. More importantly, these findings establish next-vector prediction as a powerful and scalable pathway towards ultra-efficient language models. Code: https://github.com/shaochenze/calm. Project: https://shaochenze.github.io/blog/2025/CALM.
Diffusion Beats Autoregressive in Data-Constrained Settings
https://blog.ml.cmu.edu/2025/09/22/diffusion-beats-autoregressive-in-data-constrained-settings/
#HackerNews #Diffusion #Autoregressive #MachineLearning #DataScience #AIResearch
Check out our new blog post on "Diffusion beats Autoregressive in Data-Constrained settings". The era of infinite internet data is ending. This research paper asks: What is the right generative modeling objective when data—not compute—is the bottleneck?
WorldVLA: Towards Autoregressive Action World Model
https://arxiv.org/abs/2506.21539
#HackerNews #WorldVLA #Autoregressive #Action #World #Model #AI #Research #Machine #Learning
We present WorldVLA, an autoregressive action world model that unifies action and image understanding and generation. Our WorldVLA intergrates Vision-Language-Action (VLA) model and world model in one single framework. The world model predicts future images by leveraging both action and image understanding, with the purpose of learning the underlying physics of the environment to improve action generation. Meanwhile, the action model generates the subsequent actions based on image observations, aiding in visual understanding and in turn helps visual generation of the world model. We demonstrate that WorldVLA outperforms standalone action and world models, highlighting the mutual enhancement between the world model and the action model. In addition, we find that the performance of the action model deteriorates when generating sequences of actions in an autoregressive manner. This phenomenon can be attributed to the model's limited generalization capability for action prediction, leading to the propagation of errors from earlier actions to subsequent ones. To address this issue, we propose an attention mask strategy that selectively masks prior actions during the generation of the current action, which shows significant performance improvement in the action chunk generation task.
AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation
https://arxiv.org/abs/2305.09515
#HackerNews #ARDiffusion #TextGeneration #AutoRegressive #AIResearch #MachineLearning
Diffusion models have gained significant attention in the realm of image generation due to their exceptional performance. Their success has been recently expanded to text generation via generating all tokens within a sequence concurrently. However, natural language exhibits a far more pronounced sequential dependency in comparison to images, and the majority of existing language models are trained with a left-to-right auto-regressive approach. To account for the inherent sequential characteristic of natural language, we introduce Auto-Regressive Diffusion (AR-Diffusion). AR-Diffusion ensures that the generation of tokens on the right depends on the generated ones on the left, a mechanism achieved through employing a dynamic number of denoising steps that vary based on token position. This results in tokens on the left undergoing fewer denoising steps than those on the right, thereby enabling them to generate earlier and subsequently influence the generation of tokens on the right. In a series of experiments on various text generation tasks, including text summarization, machine translation, and common sense generation, AR-Diffusion clearly demonstrated its superiority over existing diffusion language models and that it can be $100\times\sim600\times$ faster when achieving comparable results. Our code is available at https://github.com/microsoft/ProphetNet/tree/master/AR-diffusion.
Block Diffusion: Interpolating Autoregressive and Diffusion Language Models
#HackerNews #BlockDiffusion #Autoregressive #LanguageModels #DiffusionModels #MachineLearning #AIResearch
Diffusion language models offer unique benefits over autoregressive models due to their potential for parallelized generation and controllability, yet they lag in likelihood modeling and are limited to fixed-length generation. In this work, we introduce a class of block diffusion language models that interpolate between discrete denoising diffusion and autoregressive models. Block diffusion overcomes key limitations of both approaches by supporting flexible-length generation and improving inference efficiency with KV caching and parallel token sampling. We propose a recipe for building effective block diffusion models that includes an efficient training algorithm, estimators of gradient variance, and data-driven noise schedules to minimize the variance. Block diffusion sets a new state-of-the-art performance among diffusion models on language modeling benchmarks and enables generation of arbitrary-length sequences. We provide the code, along with the model weights and blog post on the project page: https://m-arriola.com/bd3lms
Block Diffusion: Interpolating Between Autoregressive and Diffusion Models
https://arxiv.org/abs/2503.09573
#HackerNews #Block #Diffusion #Autoregressive #DiffusionModels #MachineLearning #AIResearch #Interpolation
Diffusion language models offer unique benefits over autoregressive models due to their potential for parallelized generation and controllability, yet they lag in likelihood modeling and are limited to fixed-length generation. In this work, we introduce a class of block diffusion language models that interpolate between discrete denoising diffusion and autoregressive models. Block diffusion overcomes key limitations of both approaches by supporting flexible-length generation and improving inference efficiency with KV caching and parallel token sampling. We propose a recipe for building effective block diffusion models that includes an efficient training algorithm, estimators of gradient variance, and data-driven noise schedules to minimize the variance. Block diffusion sets a new state-of-the-art performance among diffusion models on language modeling benchmarks and enables generation of arbitrary-length sequences. We provide the code, along with the model weights and blog post on the project page: https://m-arriola.com/bd3lms