Mastodawn

How fast is N tokens per second really?

tokenspeed는 LLM의 초당 토큰 처리 속도를 체감할 수 있게 해주는 도구로, 다양한 속도(5~800 tok/s)와 출력 모드(코드, 텍스트, 사고, 에이전트)를 지원한다. 이를 통해 GPU나 AI 칩별 벤치마크 수치가 실제로 어떤 체감 속도를 의미하는지 직관적으로 이해할 수 있다. 특히 코드와 텍스트는 토큰 밀도가 달라 같은 토큰 속도라도 체감 차이가 크다는 점을 시각적으로 보여준다. BPE 토크나이저 기반 토큰 개념을 사용하며, AI 개발자가 모델 추론 속도를 현실감 있게 평가하는 데 유용하다.

https://mikeveerman.github.io/tokenspeed/

#llm #benchmark #tokenspeed #inference #tokenization

tokenspeed — feel LLM tokens-per-second

Arint - SEO+KI 3d ago

RT @AtlasInference: TRANSLASATION: DGX Spark hat gerade für Qwen3.6-35B mit @AtlasInference auf @sparkarena über 200 Token pro Sekunde erreicht 🔥

mehr auf Arint.info

#AIInnovation #AtlasInference #DGXSpark #LLMPerformance #Qwen36 #TokenSpeed #arint_info

https://x.com/AtlasInference/status/2055716965071663385#m

Arint - SEO+KI (@[email protected])

RT @AtlasInference: TRANSLASATION: DGX Spark hat gerade für Qwen3.6-35B mit @AtlasInference auf @sparkarena über 200 Token pro Sekunde erreicht 🔥 <a href="https://arint.info/@Arint/116593582009008646">mehr</a> auf <a href="https://arint.info/">Arint.info</a> #AIInnovation #AtlasInference #DGXSpark #LLMPerformance #Qwen36 #TokenSpeed #arint_info <a href="https://x.com/AtlasInference/status/2055716965071663385#m">https://x.com/AtlasInference/status/2055716965071663385#m</a>

Mastodon Glitch Edition

sayzard May 6

TokenSpeed: A Speed-of-Light LLM Inference Engine for Agentic Workloads
TokenSpeed는 에이전트형 작업 부하에 최적화된 초고속 LLM 추론 엔진으로, 병렬 처리, 고성능 스케줄러, 안전한 KV 리소스 재사용, 이종 가속기 지원 등의 혁신적 설계를 특징으로 한다. NVIDIA Blackwell GPU에서 TensorRT-LLM 대비 최대 11% 높은 처리량과 9% 빠른 지연 시간을 달성하며, 특히 코딩 에이전트의 대규모 토큰 처리에 강점을 보인다. 이 엔진은 다양한 AI 기업 및 연구기관과 협력하여 개발 중이며, 향후 분산 배포 지원도 예정되어 있다.

https://lightseek.org/blog/lightseek-tokenspeed.html

#llm #inference #tokenspeed #nvidiablackwell #agenticworkloads

TokenSpeed: A Speed-of-Light LLM Inference Engine for Agentic Workloads | LightSeek Foundation

LightSeek Foundation Blog

Hacker News Sep 6, 2025

Qwen3 30B A3B Hits 13 token/s on 4xRaspberry Pi 5

https://github.com/b4rtaz/distributed-llama/discussions/255

#HackerNews #Qwen3 #A3B #RaspberryPi #TokenSpeed #DistributedLLama

[v0.16.0] Qwen3 30B A3B Q40 on 4 x Raspberry Pi 5 8GB · b4rtaz distributed-llama · Discussion #255

qwen3_30b.mov Setup Device: 4 x Raspberry Pi 5 8GB Distributed Llama version: 0.16.0 Model: qwen3_30b_a3b_q40 Benchmark Evaluation Prediction 4 x Raspberry Pi 5 8GB 14.33 tok/s 13.04 tok/s b4rtaz@r...

GitHub