Mastodawn

Hacker News Oct 12, 2025

4x faster LLM inference (Flash Attention guy's company)

https://www.together.ai/blog/adaptive-learning-speculator-system-atlas

#HackerNews #4xFasterInference #FlashAttention #LLMTechnology #AIInnovation #AdaptiveLearning

AdapTive-LeArning Speculator System (ATLAS): A New Paradigm in LLM Inference via Runtime-Learning Accelerators

LLM inference that gets faster as you use it. Our runtime-learning accelerator adapts continuously to your workload, delivering 500 TPS on DeepSeek-V3.1, a 4x speedup over baseline performance without manual tuning.