Mastodawn

Mulight 沐光 (@0xMulight)

Gemini의 tool-calling 기능을 2,600만 파라미터의 소형 모델로 증류한 Needle이 공개됐다. GPU 없이 로컬에서 매우 낮은 지연으로 동작하며, 엣지 디바이스나 브라우저에서 함수 호출 기능을 구현하는 데 적합하다.

https://x.com/0xMulight/status/2055462340091646109

#gemini #toolcalling #distillation #edgeai #functioncalling

Mulight 沐光🌟 (@0xMulight) on X

Gemini的工具调用能力被蒸馏进了一个2600万参数的小模型。 Needle可以在本地运行，不需要GPU，延迟低到可以忽略。适合想在边缘设备、浏览器或者嵌入式场景里做function calling的人。 GitHub已开源，可以直接拿来做二次开发。一键安装方法： git clone https://t.co/SIw1i4UWwa cd needle &&

X (formerly Twitter)

sayzard 2d ago

fly51fly (@fly51fly)

블랙박스 LLM에서 다단계 추론과 도구 사용을 위한 프롬프트 정책을 다루고, 경험의 반복적 distillation으로 이를 개선하는 연구입니다. 에이전트/툴유즈 파이프라인 설계에 직접 관련된 주제로, 실무 적용 가능성이 있는 편이지만 아직 논문 단계입니다.

https://x.com/fly51fly/status/2055400219324641370

#llm #agents #tooluse #prompting #distillation

fly51fly (@fly51fly) on X

[LG] Prompting Policies for Multi-step Reasoning and Tool-Use in Black-box LLMs with Iterative Distillation of Experience K Sayana, K Todi, A Jash [Google Research] (2026) https://t.co/Re2TGOe4sh

X (formerly Twitter)

sayzard 4d ago

Bindu Reddy (@bindureddy)

Gemini 3.2 Flash가 DeepMind의 증류(distillation) 기술을 활용한다는 소문과 함께 언급되었다. 코딩·추론 벤치마크에서 GPT 5.5의 92% 수준 성능을 내면서 추론 비용은 15~20배 저렴하고, 지연시간도 200ms 미만으로 개선됐다는 주장이 제시됐다.

https://x.com/bindureddy/status/2054767771418861964

#gemini #deepmind #distillation #llm #inference

Bindu Reddy (@bindureddy) on X

Gemini 3.2 Flash - Capitalizing on DeepMind's clever distillation techniques... Rumors are that benchmarks show it's hitting 92% of GPT 5.5's performance on coding and reasoning tasks while being 15-20x cheaper on inference costs. The latency improvements are insane - sub-200ms

X (formerly Twitter)

sayzard 5d ago

fly51fly (@fly51fly)

Google Research가 CoDistill-GRPO를 소개했다. 이 방법은 효율적인 Group Relative Policy Optimization을 위해 공동 distillation 레시피를 제안하며, 강화학습 기반 모델 학습의 비용과 효율성을 개선할 수 있는 연구다.

https://x.com/fly51fly/status/2054308804741992625

#googleresearch #distillation #grpo #reinforcementlearning

fly51fly (@fly51fly) on X

[LG] CoDistill-GRPO: A Co-Distillation Recipe for Efficient Group Relative Policy Optimization S M Kwon, Z Sun, A T Suresh, H Jain, S Kumar [Google Research] (2026) https://t.co/qlRq6oMJDZ

X (formerly Twitter)

Hacker News 6d ago

Needle: We Distilled Gemini Tool Calling into a 26M Model

https://github.com/cactus-compute/needle

#HackerNews #Needle #Gemini #Tool #Model #AI #Distillation #Cactus #Compute

GitHub - cactus-compute/needle: 26m function call model that runs on incredibly small devices

26m function call model that runs on incredibly small devices - cactus-compute/needle

GitHub

sayzard May 10

Extracting alignment data in open models

이 논문은 사후 학습된 오픈 모델에서 상당한 양의 정렬(alignment) 훈련 데이터를 추출할 수 있음을 보여준다. 특히, 문자열 매칭 대신 고품질 임베딩 모델을 활용해 의미적 유사성을 측정함으로써 기존 방식보다 10배 이상 많은 데이터를 식별할 수 있음을 입증했다. 또한, SFT나 RL과 같은 사후 훈련 단계에서 사용된 데이터가 모델에 의해 쉽게 재생산되며, 이를 활용해 원본 성능을 회복하는 베이스 모델 훈련도 가능함을 확인했다. 이 연구는 정렬 데이터 추출의 잠재적 위험성을 드러내고, 증류(distillation) 과정이 사실상 원본 데이터에 간접적으로 재학습하는 효과를 가질 수 있음을 시사한다.

https://arxiv.org/abs/2510.18554

#alignment #openmodels #embedding #trainingdata #distillation

Extracting alignment data in open models

In this work, we show that it is possible to extract significant amounts of alignment training data from a post-trained model -- useful to steer the model to improve certain capabilities such as long-context reasoning, safety, instruction following, and maths. While the majority of related work on memorisation has focused on measuring success of training data extraction through string matching, we argue that embedding models are better suited for our specific goals. Distances measured through a high quality embedding model can identify semantic similarities between strings that a different metric such as edit distance will struggle to capture. In fact, in our investigation, approximate string matching would have severely undercounted (by a conservative estimate of $10\times$) the amount of data that can be extracted due to trivial artifacts that deflate the metric. Interestingly, we find that models readily regurgitate training data that was used in post-training phases such as SFT or RL. We show that this data can be then used to train a base model, recovering a meaningful amount of the original performance. We believe our work exposes a possibly overlooked risk towards extracting alignment data. Finally, our work opens up an interesting discussion on the downstream effects of distillation practices: since models seem to be regurgitating aspects of their training set, distillation can therefore be thought of as indirectly training on the model's original dataset.

arXiv.org

sayzard May 8

Z.ai (@Zai_org)

CogViT 비전 인코더의 기술적 핵심을 소개. SigLIP2와 DINOv3를 활용한 듀얼 티처 증류, 마스크드 모델링과 대조학습의 2단계 사전학습, 대규모 학습 안정화를 위한 QK-Norm, 멀티모달 멀티토큰 예측을 설명한다.

https://x.com/Zai_org/status/2052426791004876863

#visionencoder #multimodal #distillation #pretraining #cv

Z.ai (@Zai_org) on X

Technical highlights: CogViT Vision Encoder - Built with dual-teacher distillation: SigLIP2 for semantics, DINOv3 for texture. A two-stage recipe, masked modeling, then contrastive pretraining, with QK-Norm for attention stability at scale. Multimodal Multi-Token Prediction

X (formerly Twitter)

sayzard May 6

Shivay Lamba (@HowDevelop)

AI Engineer Melbourne 행사에서 엣지 환경을 위해 AI 모델을 distill 및 quantize하는 방법을 발표한다는 소식입니다. 경량화 모델과 온디바이스/엣지 배포에 관심 있는 개발자에게 유용한 내용입니다.

https://x.com/HowDevelop/status/2051893884767363475

#aiengineering #distillation #quantization #edgeai #models

Shivay Lamba (@HowDevelop) on X

Super stoked to speak at @aiDotEngineer (AI Engineer) Melbourne next month on how to distill and quantize AI models for the edge!

X (formerly Twitter)

Habr May 6

Разбираю «Qwen3.5-21B-Claude-4.6-Opus-Heretic-Uncensored»: что на самом деле внутри файнтюна с громким именем

В телеграме завирусился пост: якобы кто-то “дообучил Qwen 3.5 до уровня Claude 4.6 Opus и убрал цензуру через Heretic”. Я открыл карточку модели на HuggingFace и провёл вечер, разбираясь, что под капотом. Спойлер: там много интересной техники, но к Claude эта модель имеет такое же отношение, как кроссовки “Adibas” к Adidas. Разбираю distillation, depth upscaling и abliteration без маркетинговой обёртки.

https://habr.com/ru/articles/1032324/

#LLM #Qwen #abliteration #файнтюн #HuggingFace #distillation #intepretability #openweights

Разбираю «Qwen3.5-21B-Claude-4.6-Opus-Heretic-Uncensored»: что на самом деле внутри файнтюна с громким именем

Технический разбор модели, которую в телеграме продают как «Claude без цензуры» В моей ленте недавно завирусился пост: якобы кто-то «дообучил Qwen 3.5 до уровня Claude 4.6 Opus, убрал цензуру через...

Хабр

sayzard May 1

AshutoshShrivastava (@ai_for_success)

엘론 머스크가 xAI가 OpenAI 모델에 대한 distillation 기법을 사용해 Grok을 학습했는지 질문을 받자, 이는 AI 업계 전반의 일반적인 관행이라고 답했고, ‘부분적으로 그렇다’는 취지로 발언했습니다.

https://x.com/ai_for_success/status/2049915997591912665

#xai #openai #grok #distillation #llm

AshutoshShrivastava (@ai_for_success) on X

Is this true?? Elon Musk was asked if xAI has used distillation techniques on OpenAI models to train Grok, and he asserted it was a general practice among AI companies. Asked if that meant “yes,” he said, “Partly.”

X (formerly Twitter)