fly51fly (@fly51fly)

Stanford와 EPFL 공동연구진(J Kazdan, N Levi, R Schaeffer, J Chudnovsky 등)이 2026년 arXiv에 'Scale Dependent Data Duplication' 논문을 발표했습니다. 본 논문은 학습 데이터 중복(data duplication)이 모델 성능과 일반화에 미치는 영향이 데이터 스케일에 따라 어떻게 달라지는지 분석하며, 데이터 중복 관련 문제와 스케일링 관점의 시사점을 다룹니다.

https://x.com/fly51fly/status/2031483138908762496

#dataduplication #datasetquality #mlresearch #arxiv

fly51fly (@fly51fly) on X

[LG] Scale Dependent Data Duplication J Kazdan, N Levi, R Schaeffer, J Chudnovsky… [Stanford University & EPFL] (2026) https://t.co/tIspicuiEc

X (formerly Twitter)

Rohan Paul (@rohanpaul_ai)

연구자들은 언어 모델이 더 어려운 질문을 마주하면 내부 '사고 경로'가 더 적은 경로로 수축한다는 사실을 발견함. 즉 모델이 혼란스러울 때 내부 표현이 압축되며, 이 관찰을 활용해 모델을 개선할 수 있다는 해석학적·응용적 시사점이 제시됨.

https://x.com/rohanpaul_ai/status/2031529743494033862

#languagemodels #interpretability #mlresearch

Rohan Paul (@rohanpaul_ai) on X

Researchers found that when language models face harder questions, their internal brain activity literally shrinks into fewer paths. Language models actually compress their internal thinking when they get confused, and we can use that to help them. Standard AI models usually

X (formerly Twitter)
Research shows adding more features to ML regression models can introduce hidden structural risks. Every additional feature creates dependencies on upstream data pipelines, and low-signal variables may appear important due to noise, leading to models that behave inconsistently when deployed. https://www.marktechpost.com/2026/03/08/beyond-accuracy-quantifying-the-production-fragility-caused-by-excessive-redundant-and-low-signal-features-in-regression/ #AIagent #AI #GenAI #MLResearch #MarkTechPost
Beyond Accuracy: Quantifying the Production Fragility Caused by Excessive, Redundant, and Low-Signal Features in Regression

Beyond Accuracy: Quantifying the Production Fragility Caused by Excessive, Redundant, and Low-Signal Features in Regression

MarkTechPost

yontr (@yontrtwt)

AI 종사자들에게 중요한 그래프 설명: 동일한 학습 기간 동안 Cerebras 하드웨어가 더 작은 모델로 더 높은 정확도를 제공한다는 점을 강조하며, 이는 '하드웨어 로터리(Hardware Lottery)' 개념의 중요성을 보여줌. 하드웨어 선택이 성능·효율에 미치는 영향에 대한 인사이트 제공.

https://x.com/yontrtwt/status/2022036737674056002

#hardware #cerebras #mlresearch #trainingefficiency

yontr (@yontrtwt) on X

Everyone following AI should take a moment to understand this graph to grasp the importance of hardware. The important and less talked about part of this graph is that for the SAME DURATION cerebras provides HIGHER ACCURACY with a SMALLER MODEL. This is the Hardware Lottery by

X (formerly Twitter)

Q*Satoshi (@AiXsatoshi)

새 모델 공개: 'Step 3.5 Flash'라는 196B MoE 모델이 공개되었으며, DeepSeek V3.2와 비교해 파라미터 수는 1/3인데도 동등 이상 성능을 보인다고 발표했습니다. 성능 지표로는 속도 100–300 tok/s, MTP-3 적용으로 디코드 효율이 약 3–6배 개선되었고, SWE-bench 74.4%(DS 73.1%), Terminal-Bench 51.0%(DS 46.4%)를 보고했습니다.

https://x.com/AiXsatoshi/status/2018190630997160369

#moe #modelrelease #mlresearch #efficiency #deepseek

Q*Satoshi⏩ (@AiXsatoshi) on X

Step 3.5 Flash 、196B MoEが公開された DeepSeek V3.2との比較 • 推論効率: パラメータ数1/3にも関わらずDeepSeek-V3.2と比較して同等以上の性能 • 速度: 100-300 tok/s。MTP-3搭載によりDS比で約3〜6倍のデコード効率 • タスク: SWE-bench 74.4% (DS 73.1%)、Terminal-Bench 51.0% (DS 46.4%)

X (formerly Twitter)

Discover 7 practical scikit‑learn tricks that let you weave preprocessing pipelines directly into hyperparameter searches. Save time, avoid data leakage, and boost model reliability—all with clean, reusable code. Perfect for open‑source projects and reproducible research. Dive in to level up your ML workflow! #scikitlearn #pipeline #hyperparamtuning #mlresearch

🔗 https://aidailypost.com/news/7-scikit-learn-tricks-embed-preprocessing-pipelines-hyperparameter

fly51fly (@fly51fly)

논문 'SimMerge: Learning to Select Merge Operators from Similarity Signals'은 유사성 신호를 이용해 모델 병합에 사용할 merge 연산자를 학습적으로 선택하는 방법을 제안합니다. O. Bolton 등(Cohere·Google) 저자이며 arXiv에 공개되어 모델 병합과 파라미터 통합 관련 연구 및 MLOps 실무에 영향을 줄 수 있습니다.

https://x.com/fly51fly/status/2013004252159885389

#simmerge #modelmerging #mlresearch #cohere

fly51fly (@fly51fly) on X

[LG] SimMerge: Learning to Select Merge Operators from Similarity Signals O Bolton, Aakanksha, A Ahmadian, S Hooker... [Cohere & Google] (2026) https://t.co/lbmNeeHiwO

X (formerly Twitter)

Dự án WaveHelix đang thử nghiệm xây dựng "mô hình thế giới" không dùng gradient descent. Hệ thống dự đoán chuyển động (vd: bóng nảy) bằng cách chạy kịch bản ứng cử viên, chấm điểm & pha trộn mô hình chính. Sử dụng "rungs" (ô nhớ), "spirals" (định tuyến) & "curl" để khám phá. Một cách tiếp cận ML mới lạ!

#WaveHelix #MachineLearning #AI #NoGradientDescent #WorldModel #SideProject #MLResearch
#HọcMáy #TríTuệNhânTạo #MôHìnhThếGiới #DựÁnPhụ #NghiênCứuML

https://www.reddit.com/r/LocalLLaMA/comments/

Our new series explains how a language model is built from the ground up. Part 1 covers the tokenizer and reveals how vocabulary size, merge rules, and byte mapping influence every downstream component.

Start with Part 1 and stay with the series as each chapter is released. https://www.tag1.com/white-paper/part1-tokenization-building-an-llm-from-scratch-in-rust/

#OpenSource #FOSS #MachineLearning #MLResearch #DeepLearning #LLM

Part 1: Tokenization, Building an LLM From Scratch in Rust

Learn how to build a language model from scratch in Rust, starting with part 1 of 6: tokenization, BPE, and vocabulary trade-offs.

Tag1

Part 1 of our six part series on building a language model is now published. We begin with tokenization and show how text is converted into numerical sequences that the model can process.

Read Part 1 and follow the full series as we move from the tokenizer to tensors and training. https://www.tag1.com/white-paper/part1-tokenization-building-an-llm-from-scratch-in-rust/

#FOSS #MLResearch #MachineLearning #DeepLearning #NLP #LanguageModels

Part 1: Tokenization, Building an LLM From Scratch in Rust

Learn how to build a language model from scratch in Rust, starting with part 1 of 6: tokenization, BPE, and vocabulary trade-offs.

Tag1