Mastodawn

SubQ: Sub-quadratic LLM built for 12M-token reasoning

SubQ는 12백만 토큰의 긴 문맥 추론을 지원하는 최초의 완전한 서브쿼드러틱(sub-quadratic) LLM으로, 전체 코드 저장소, 긴 이력, 지속 상태를 품질 저하 없이 처리할 수 있다. 기존 트랜스포머의 O(n²) 복잡도를 O(n)으로 줄인 희소 어텐션 아키텍처를 적용해 계산량을 1,000배 이상 절감하며, 긴 문맥 기반 소프트웨어 엔지니어링 작업에서 우수한 성능을 보인다. 개발자와 기업을 위한 API와 코딩 에이전트용 레이어를 제공하며, OpenAI 호환 엔드포인트와 통합 가능하다. 이는 LLM의 긴 문맥 처리 한계를 근본적으로 확장하는 혁신적 아키텍처다.

https://subq.ai/

#llm #longcontext #transformer #sparseattention #aiarchitecture

Subquadratic — Efficiency is Intelligence

Subquadratic is a frontier AI research and infrastructure company building a new class of LLMs.

Subquadratic

sayzard 4d ago

Tom Maiaroto (@tmaiaroto)

컨텍스트 윈도우를 256k로 확장했는데도 여전히 93~95 tokens/sec 속도를 유지한다는 성능 테스트 결과입니다. 매우 긴 컨텍스트에서도 추론 속도가 안정적으로 유지된다는 점이 인상적입니다.

https://x.com/tmaiaroto/status/2052653243793379609

#llm #contextwindow #inference #performance #longcontext

Tom Maiaroto (@tmaiaroto) on X

@ItsmeAjayKV @UnslothAI @googlegemma actually, I also increased the context window to 256k and it's still running at 93-95 tokens/sec.

X (formerly Twitter)

sayzard 4d ago

LCM: Lossless Context Management

LCM(Lossless Context Management)은 장기 문맥 작업에서 Claude Code를 능가하는 결정론적 LLM 메모리 아키텍처입니다. LCM은 재귀적 문맥 압축과 작업 분할을 통해 모든 이전 상태를 손실 없이 보존하면서도 종료 보장과 단기 작업에서의 무비용 연속성을 제공합니다. 이 접근법은 기존 Recursive Language Models를 확장하며, 32K에서 1M 토큰 범위의 긴 문맥 평가에서 우수한 성능을 입증했습니다. LCM은 복잡한 재귀 제어 흐름을 엔진 관리 구조로 대체하여 AI 에이전트의 메모리 관리와 장기 문맥 처리에 혁신적 기여를 합니다.

https://arxiv.org/abs/2605.04050

#llm #contextmanagement #recursion #longcontext #aiagent

LCM: Lossless Context Management

We introduce Lossless Context Management (LCM), a deterministic architecture for LLM memory that outperforms Claude Code on long-context tasks. When benchmarked using Opus 4.6, our LCM-augmented coding agent, Volt, achieves higher scores than Claude Code on the OOLONG long-context eval, including at every context length between 32K and 1M tokens. LCM may be considered both a vindication and extension of the recursive paradigm pioneered by Recursive Language Models (RLMs). Our results demonstrate that recursive context manipulation can outperform not just conventional LLMs, but frontier coding agents with native file-system access. LCM departs from RLM by decomposing symbolic recursion into two deterministic, engine-managed mechanisms: recursive context compression, in which a hierarchical summary DAG automatically compacts older messages while retaining lossless pointers to every original; and recursive task partitioning, in which engine-managed parallel primitives like LLM-Map replace model-written loops. This trade-off, analogous to the move from GOTO to structured control flow in program-ming language design, sacrifices maximal flexibility for termination guarantees, zero-cost continuity on short tasks, and lossless retrievability of all prior state.

arXiv.org

sayzard 6d ago

Granite 4.1 LLMs: How They're Built
IBM의 Granite 4.1 LLM은 3B, 8B, 30B 파라미터의 디코더 전용 밀집 트랜스포머 모델로, 약 15조 토큰을 다단계 사전학습과 장문 컨텍스트 확장(최대 512K 토큰)으로 학습했다. 고품질 데이터 선별과 LLM-as-Judge 프레임워크를 통한 엄격한 감독 미세조정, 그리고 다단계 강화학습(GRPO와 DAPO 손실 적용)을 통해 수학, 코딩, 지시 수행 능력을 크게 향상시켰다. 특히 8B 모델은 더 큰 32B MoE 모델과 견줄만한 성능을 보이며, 모든 모델은 Apache 2.0 라이선스로 공개되었다. 이 연구는 고품질 데이터와 복합 학습 전략이 소형 LLM 성능 향상에 핵심임을 보여준다.

https://huggingface.co/blog/ibm-granite/granite-4-1

#llm #reinforcementlearning #finetuning #longcontext #transformer

Granite 4.1 LLMs: How They’re Built

A Blog post by IBM Granite on Hugging Face

sayzard May 1

Design Arena (@Designarena)

Mistral AI의 Mistral Medium 3.5가 Design Arena에 추가됐다. 128B 플래그십 모델로 256k 컨텍스트 윈도우를 제공하며, 추론·코딩·지시 수행 능력이 강하고 요청별로 유연하게 성능을 조절할 수 있다고 소개된다.

https://x.com/Designarena/status/2049879100962046431

#mistral #medium35 #longcontext #coding #multimodal

Design Arena (@Designarena) on X

Mistral Medium 3.5 by @MistralAI is now on Design Arena! A flagship 128B model with a 256k context window, delivering powerful reasoning, coding, and instruction-following with flexible effort per request.

X (formerly Twitter)

sayzard May 1

Design Arena (@Designarena)

xAI의 Grok 4.3이 Design Arena에 추가됐다. 이 모델은 네이티브 멀티모달 시스템으로, 긴 컨텍스트 추론과 도구를 활용한 코드 실행을 지원하는 최신 모델로 소개된다.

https://x.com/Designarena/status/2050011139556143277

#grok #xai #multimodal #codeexecution #longcontext

Design Arena (@Designarena) on X

Grok 4.3 by @xai has been added to Design Arena! xAI’s newest model, a natively multimodal system built for long-context reasoning and tool-augmented code execution.

X (formerly Twitter)

sayzard May 1

fly51fly (@fly51fly)

희소 어텐션과 계층적 메모리를 결합해 긴 컨텍스트 LLM 서빙을 확장 가능하게 만드는 방법을 제안합니다. 장문 입력 처리와 효율적인 추론 인프라에 직접적으로 관련된 중요한 연구입니다.

https://x.com/fly51fly/status/2049968345911574757

#longcontext #llm #attention #memory #serving

fly51fly (@fly51fly) on X

[LG] Unifying Sparse Attention with Hierarchical Memory for Scalable Long-Context LLM Serving Z Zhao, B Lu, S Lin, Y Chen… [Microsoft Research] (2026) https://t.co/kWDltVXx8E

X (formerly Twitter)

sayzard Apr 30

swyx (@swyx)

DeepSeek v4가 벤치마크 점수 경쟁보다 긴 컨텍스트 효율성 기술(CSA, HCA, mHC 등)과 낮은 비용으로 SOTA 수준의 성능을 보여주며 주목받고 있다. 추론 최적화 비용을 과시하지 않고도 경쟁력을 입증한 점이 인상적이다.

https://x.com/swyx/status/2049569820728078533

#deepseek #llm #longcontext #efficiency #aimodel

swyx 🇸🇬 (@swyx) on X

IMO DeepSeek v4 demonstrated utter confidence and competence by not benchmaxxing, not focusing on some BS final run cost, not even spending inference-optimal compute. just showed up, demonstrated SOTA long context efficiency techniques (CSA, HCA, mHC, flash at 8% cost of pro,

X (formerly Twitter)

sayzard Apr 28

Dan McAteer (@daniel_mac8)

HyLo(Hybrid Long-context)가 Transformers 기반 모델을 처음부터 재학습하지 않고도 유효 컨텍스트 길이를 32배 늘리고 KV-cache를 90% 줄였다고 소개했다. 장문 처리 효율을 크게 개선하는 하이브리드 구조로, 미래 ASI는 순수 Transformer가 아닐 수 있다는 전망도 제시했다.

https://x.com/daniel_mac8/status/2049180066597277727

#longcontext #transformer #hylo #kvcache #ai

Dan McAteer (@daniel_mac8) on X

32x effective context length + 90% KV-cache reduction with HyLo: Hybrid Long-context. Importantly, done *without* training the Transformers based model from scratch. Prediction: > ASI will not be pure Transformers. It will be hybrid.

X (formerly Twitter)

sayzard Apr 28

fly51fly (@fly51fly)

Stanford 연구진이 긴 문서 집합에서 질문응답을 더 확장 가능하게 수행하기 위한 구조화된 추론 방법인 ‘Contexts are Never Long Enough’를 제안했습니다. 긴 컨텍스트의 한계를 줄이고, 여러 문서를 활용한 QA 성능 향상을 목표로 하는 연구입니다.

https://x.com/fly51fly/status/2048875767669666022

#llm #reasoning #questionanswering #longcontext #stanford

fly51fly (@fly51fly) on X

[CL] Contexts are Never Long Enough: Structured Reasoning for Scalable Question Answering over Long Document Sets H Joshi, P Shethia, J Dao, M S. Lam [Stanford University] (2026) https://t.co/YVp2FKDAEb

X (formerly Twitter)