Mastodawn

Sudo su (@sudoingX)

Claude Opus 4.6의 추론 능력을 Qwen 3.5 27B dense에 증류한 실험을 공개. 단일 RTX 3090에서 'Qwopus'로 구동하며 Claude의 코딩 에이전트(clause code)를 통해 실행, 생각 모드에서 초당 29–35토큰 처리 속도를 기록. 기본 Qwen의 jinja 관련 버그는 전이되지 않았고 하니스와 모델이 일치한다고 보고함.

https://x.com/sudoingX/status/2030237974286192815

#qwen #claude #modeldistillation #inference #gpu

Sudo su (@sudoingX) on X

Qwopus on a single RTX 3090. Claude Opus 4.6 reasoning distilled into Qwen 3.5 27B dense, running through Claude's own coding agent (claude code). 29-35 tok/s with thinking mode on. the jinja bug that kills thinking on base Qwen doesn't carry over. harness and model matched.

X (formerly Twitter)

sayzard Feb 28

Sebastian Raschka (@rasbt)

rasbt가 모델 증류(Claude distillation)를 주제로 챕터를 집필하면서, OpenRouter와 Ollama를 이용해 다양한 오픈웨이트 모델로부터 증류용 데이터를 생성하는 유틸리티를 공개했습니다. 관련 코드와 설명은 'reasoning-from-scratch' 저장소의 Chapter 8 README에 정리되어 있습니다.

https://x.com/rasbt/status/2027449675654058190

#distillation #modeldistillation #opensource #ollama #openrouter

Sebastian Raschka (@rasbt) on X

Claude distillation has been a big topic this week while I am (coincidentally) writing Chapter 8 on model distillation. In that context, I shared some utilities to generate distillation data from all sorts of open-weight models via OpenRouter and Ollama: https://t.co/IsfNDpcGAw

X (formerly Twitter)

sayzard Feb 24

Anthropic (@AnthropicAI)

합법적인 증류는 모델을 작고 저렴하게 만드는 데 사용되지만, 외국 연구소가 미국 모델을 불법적으로 증류하면 보안 장치를 제거하고, 이를 군사·정보·감시 시스템에 활용할 수 있다는 경고가 제기되었다. 이는 AI 기술의 국제적 이용과 윤리 문제에 대한 중요한 정책적 논의를 촉발할 수 있다.

https://x.com/AnthropicAI/status/2025997929840857390

#aipolicy #modeldistillation #aiethics #security

Anthropic (@AnthropicAI) on X

Distillation can be legitimate: AI labs use it to create smaller, cheaper models for their customers. But foreign labs that illicitly distill American models can remove safeguards, feeding model capabilities into their own military, intelligence, and surveillance systems.

X (formerly Twitter)

sayzard Feb 24

Anthropic (@AnthropicAI)

Anthropic의 모델 Claude가 DeepSeek, Moonshot AI, MiniMax 등 다른 AI 연구실에 의해 대규모 산업 수준의 증류(distillation) 공격을 당한 것으로 밝혀졌다. 이들 연구실은 2만 4천 개 이상의 허위 계정을 만들어 1,600만 회 이상의 상호작용을 통해 Claude의 능력을 추출하여 자체 모델 개선에 활용한 것으로 알려졌다.

https://x.com/AnthropicAI/status/2025997928242811253

#aisecurity #modeldistillation #anthropic #claude #deepseek

Anthropic (@AnthropicAI) on X

We’ve identified industrial-scale distillation attacks on our models by DeepSeek, Moonshot AI, and MiniMax. These labs created over 24,000 fraudulent accounts and generated over 16 million exchanges with Claude, extracting its capabilities to train and improve their own models.

X (formerly Twitter)

AI Daily Post Feb 23

Anthropic says Chinese firms and DeepSeek have been training AI by distilling Claude’s reasoning into their own models. The claim raises fresh questions about model‑distillation practices, IP protection and the global race for smarter systems. What does this mean for the future of open‑source AI? Dive into the details. #Anthropic #ClaudeAI #DeepSeek #ModelDistillation

🔗 https://aidailypost.com/news/anthropic-alleges-deepseek-chinese-firms-used-claudes-reasoning-train

Winbuzzer Feb 13

https://winbuzzer.com/2026/02/13/openai-warns-congress-deepseek-stole-ai-training-data-xcxwbn/

OpenAI Warns Congress: DeepSeek Distills US AI Frontier Models

#AI #OpenAI #DeepSeek #AIModels #GenAI #AICompetition #ChinaAI ##AISafety #ModelDistillation #USChinaRelations

sayzard Feb 2

merve (@mervenoyann)

NVIDIA가 최근 C-RADIOv4 SOTA 이미지 인코더를 공개했습니다. 두 가지 크기(shape-optimized 431M, huge 653M)로 제공되며 SigLIP2, DINOv3, SAM3에서 증류 및 세그멘테이션 전이 학습을 거쳐 제작되었습니다. DINOv3(보다 큰 모델)와 동등하거나 더 나은 성능을 보인다고 보고되었습니다.

https://x.com/mervenoyann/status/2018301356663079384

#nvidia #cradio #imageencoder #modeldistillation #computervision

merve (@mervenoyann) on X

NVIDIA released C-RADIOv4 sota image encoders past week 🙌🏻 > they come in shape-optimized (431M) and huge (653M) > distilled from SigLIP2, DINOv3 and SAM3 (transferred for segmentation) outperforms/on par with DINOv3 (10x larger than this model) 🔥

X (formerly Twitter)

sayzard Jan 27

Alpár Kertész (@Criticality47)

LTX-2로 긴 음악 비디오 생성 시 한계에 도달했다는 경험 공유입니다. 특히 증류된 GGUF Q4_K_M 버전에서 제약이 보였고, 더 단순한 증류 버전으로 시도해볼 가능성을 언급하고 있습니다.

https://x.com/Criticality47/status/2015928720964333758

#ltx2 #gguf #musicgeneration #modeldistillation

Alpár Kertész (@Criticality47) on X

Welp, for me this is the limit of LTX-2’s capabilities when it comes to generating longer music videos @cocktailpeanut . I’m not giving up tho! I just need to accept that the distilled GGUF Q4_K_M version has its limits. The simple distilled version might work, but I need time to

X (formerly Twitter)

sayzard Jan 9

Dan Goldwasser (@dgoldwas)

10초 분량의 720p 영상이 'distilled' 모델로 3분 만에 렌더링되었다는 짧은 보고. 결과는 꽤 괜찮았으며, 작성자는 비증류(non-distilled) 모델로의 비교 실험을 시도해 보고 싶어함. 성능(속도)과 품질(비증류와의 차이) 비교에 대한 관심을 드러낸 트윗.

https://x.com/dgoldwas/status/2009332187263578417

#videogeneration #modeldistillation #rendering #ai

Dan Goldwasser (@dgoldwas) on X

@cocktailpeanut 10-seconds at 720p rendered in 3 minutes using distilled.... not too bad. Curious to try the non-distilled to see how that compares in the result.

X (formerly Twitter)

AI Daily Post Dec 9

Distilling billion‑parameter models into lean student nets can slash latency by 2‑3× while cutting costs double‑digit. From chatbots to recommendation engines, the gains are real. Dive into the benchmarks and see how open‑source pipelines are reshaping AI efficiency. #ModelDistillation #Latency #StudentModel #Chatbots

🔗 https://aidailypost.com/news/model-distillation-cuts-latency-23-lowers-costs-by-doubledigit