Mastodawn

The Bureau Outlives the Engine

Anthropic 기반 에이전트 시스템에서 에이전트의 정체성은 모델 가중치(엔진)가 아닌 에이전트의 컨텍스트를 담은 마크다운 파일(사무실)에 존재한다는 사실이 실험으로 입증되었다. 오픈소스 모델 MiniMax M2.5가 기존 Anthropic 모델 대신 동일한 에이전트 컨텍스트를 읽고 유사한 결과를 생성하며, 이를 통해 에이전트 시스템이 특정 공급자에 종속되지 않고 엔진 교체가 가능함을 보여준다. 다만, 작업별로 적합한 엔진을 선택하는 세밀한 조정은 여전히 필요하다. 이 사례는 AI 에이전트 구축 시 컨텍스트와 상태를 독립적으로 관리하는 설계의 중요성을 강조한다.

https://uaxd.fr/dispatches/the-bureau-outlives-the-engine.html

#aiagent #opensource #languagemodel #agentarchitecture #anthropic

The Bureau Outlives the Engine — UAXD

A non-Anthropic, open-source model read an agent's office and continued the work. The substrate changed; the identity did not.

sayzard 1d ago

How LLMs Work

이 글은 대형 언어 모델(LLM)의 작동 원리를 기계적 관점에서 상세히 설명한다. LLM은 주어진 토큰 시퀀스에 대해 다음 토큰의 확률 분포를 예측하는 함수이며, 이 과정이 반복되어 텍스트를 생성한다. 온도(temperature) 파라미터는 출력 확률 분포의 샤프함을 조절해 생성 결과의 다양성과 결정성을 조절하는 역할을 한다. 모델은 개별 학습 예시를 기억하지 않고, 방대한 텍스트 데이터에서 학습한 통계적 패턴을 내재화하여 새로운 입력에 일반화한다. 또한, LLM 출력이 본질적으로 확률적이며, 일관성 유지와 오류 대응을 위해 여러 번 테스트와 검증이 필요함을 강조한다.

https://arpitbhayani.me/blogs/how-llms-work/

#llm #languagemodel #temperature #nexttokenprediction #softmax

How LLMs Really Work

If you have used ChatGPT, Gemini, or Claude, you have already formed an intuition about what these systems do. You type something in, and text comes back that feels coherent, knowledgeable, and sometimes eerily human. But the machinery underneath is simultaneously simpler and stranger than most people expect.

Arpit Bhayani

sayzard 1d ago

fly51fly (@fly51fly)

Google DeepMind가 바이트 단위 언어모델에서 패치 크기와 연산량을 분리하는 ‘Scratchpad Patching’ 기법을 공개했다. 더 유연한 계산 제어를 통해 효율적인 바이트 레벨 언어모델 설계를 가능하게 하는 새로운 방법론이다.

https://x.com/fly51fly/status/2054312634514915426

#deepmind #languagemodel #bytelevel #efficiency #arxiv

fly51fly (@fly51fly) on X

[CL] Scratchpad Patching: Decoupling Compute from Patch Size in Byte-Level Language Models L Zheng, V Bashlovkina, T Dozat, D Garrette… [Google DeepMind] (2026) https://t.co/KQc96FcHPp

X (formerly Twitter)

sayzard 3d ago

fly51fly (@fly51fly)

ByteDance Seed가 'Continuous Latent Diffusion Language Model' 논문을 소개했다. 연속 잠재공간에서 확산(diffusion) 방식을 활용하는 언어모델로, 기존 autoregressive LLM과 다른 생성 패러다임을 제시하는 연구다. 새로운 텍스트 생성 접근법으로 주목할 만하다.

https://x.com/fly51fly/status/2053589032333152693

#llm #diffusion #languagemodel #bytedance #research

fly51fly (@fly51fly) on X

[CL] Continuous Latent Diffusion Language Model H Guo, Q Zhao, Y Zhao, S Nie… [Bytedance Seed] (2026) https://t.co/WliGZc9lMy

X (formerly Twitter)

sayzard 3d ago

fly51fly (@fly51fly)

Apple 연구진이 'TIDE: Every Layer Knows the Token Beneath the Context'라는 논문을 공개했다. 각 레이어가 컨텍스트 아래의 토큰 정보를 인지하는 방식의 새로운 언어모델/학습 아이디어를 제안하는 연구로 보이며, 생성 모델 구조와 표현 학습 측면에서 주목할 만하다.

https://x.com/fly51fly/status/2053592049103094088

#llm #languagemodel #research #apple #arxiv

fly51fly (@fly51fly) on X

[CL] TIDE: Every Layer Knows the Token Beneath the Context A Jaiswal, L Hannah, H Kim, D Hoang… [Apple] (2026) https://t.co/UuckHAQRl7

X (formerly Twitter)

sayzard May 7

nanowhale은 DeepSeek‑V4 아키텍처로 처음부터 학습한 약 110M 파라미터 언어모델입니다. 레포에 모델 코드·설정·토크나이저와 사전학습(5K steps on FineWeb‑Edu)·SFT(3K steps on SmolTalk) 스크립트 및 성능 결과가 포함돼 있습니다. MLA, MoE, Hyper‑Connections 등 설계 특징과 bf16 NaN, from_pretrained 재초기화 같은 알려진 이슈도 명시하며 MIT 라이선스로 공개되었습니다.

https://github.com/huggingface/nanowhale

#nanowhale #deepseekv4 #languagemodel #moe #huggingface

GitHub - huggingface/nanowhale

Contribute to huggingface/nanowhale development by creating an account on GitHub.

GitHub

sayzard May 6

Show HN: Meaning forks. SRT sees it
SRT(Semiotic-Reflexive Transformer)는 기존의 동결된 인과 언어 모델에 경량의 반사적 의미 인식 모듈을 추가하는 어댑터 아키텍처이다. 이 모듈들은 의미의 분기점을 감지하고, 모델이 자신의 의미 처리 과정을 반성적으로 인지하며, 필요시 의미적 수정을 주입한다. 7B 규모의 백본 모델은 완전히 고정되고, 약 1,460만 개의 파라미터만 학습되어 빠른 훈련이 가능하며, 다양한 백본 모델에 적용할 수 있다. SRT는 C.S. 퍼스의 기호학 이론에 기반하여, 단어가 서로 다른 커뮤니티에서 다르게 해석되는 현상을 모델이 인지하도록 설계되었다.

https://github.com/space-bacon/SRT

#transformer #adapter #languagemodel #semiotics #nlp

GitHub - space-bacon/SRT

Contribute to space-bacon/SRT development by creating an account on GitHub.

GitHub

liaml May 6

Can a #LanguageModel paint? I've built an app which gets language models to paint a piece iteratively (one stroke at a time) rather than producing it in one-shot from a prompt. Not sure if this counts as #GenerativeArt

https://www.etive-mor.com/blog/can-a-language-model-paint/

https://www.liamlaverty.com/paint-by-language-model/inspect/chagall-fiddler-village-001

sayzard May 2

0xMarioNawfal (@RoundtableSpace)

1931년 이전 텍스트만으로 학습된 13B 규모의 새 AI 모델이 공개됐다. 인터넷, 위키피디아, 현대 코드 없이 훈련되어 1930년 12월 31일 시점의 세계관만 반영한다는 점이 특징이며, 현대 웹 데이터에 의존하는 기존 모델들과의 차이를 보여준다.

https://x.com/RoundtableSpace/status/2050677961129439499

#llm #openai #research #dataset #languagemodel

0xMarioNawfal (@RoundtableSpace) on X

Researchers just released a 13B model trained exclusively on text published before 1931. No internet. No Wikipedia. No modern code. Its worldview is frozen at December 31, 1930. The reason is fascinating — every major model today shares a common ancestor in the modern web,

X (formerly Twitter)

Simon Willison Apr 28

Introducing talkie: a 13B vintage language model from 1930

https://simonwillison.net/2026/Apr/28/talkie/#atom-everything

#AI #VintageTech #LanguageModel

Introducing talkie: a 13B vintage language model from 1930

New project from Nick Levine, David Duvenaud, and Alec Radford (of GPT, GPT-2, Whisper fame). talkie-1930-13b-base (53.1 GB) is a "13B language model trained on 260B tokens of historical pre-1931 …

Simon Willison’s Weblog