I understand why #arXiv is floating off on its own, but I give it 7 years before #enshittification, when pdf goes behind a paywall and "experimental" html becomes unusable.

🎓 ArXiv, the pioneering preprint server, declares independence from Cornell

「 The move will help arXiv raise more money from a broader range of donors to fund the staffing and technology needed to support the site’s skyrocketing number of preprints—expected to top 300,000 this year—says Greg Morrisett, dean and vice provost of Cornell Tech, the graduate-education and research arm of the university that manages arXiv 」

https://www.science.org/content/article/arxiv-pioneering-preprint-server-declares-independence-cornell

#arxiv #research #science #academia

fly51fly (@fly51fly)

IBM Research 소속 연구진이 중간 학습(mid-training)에서의 retention과 interaction을 다루는 PRISM 연구를 공개했습니다. AI 모델 학습 역학과 성능 유지에 관한 새로운 연구 결과로, 대규모 언어모델 학습 최적화에 참고할 만한 내용입니다.

https://x.com/fly51fly/status/2034744303046774885

#airesearch #llm #training #ibmresearch #arxiv

fly51fly (@fly51fly) on X

[LG] PRISM: Demystifying Retention and Interaction in Mid-Training B Runwal, A Agrawal, A Roy, R Panda [IBM Research] (2026) https://t.co/Uyjv6wQRJR

X (formerly Twitter)

Flash-KMeans: Fast and Memory-Efficient Exact K-Means

https://arxiv.org/abs/2603.09229

#arxiv

Flash-KMeans: Fast and Memory-Efficient Exact K-Means

$k$-means has historically been positioned primarily as an offline processing primitive, typically used for dataset organization or embedding preprocessing rather than as a first-class component in online systems. In this work, we revisit this classical algorithm under the lens of modern AI system design and enable $k$-means as an online primitive. We point out that existing GPU implementations of $k$-means remain fundamentally bottlenecked by low-level system constraints rather than theoretical algorithmic complexity. Specifically, the assignment stage suffers from a severe IO bottleneck due to the massive explicit materialization of the $N \times K$ distance matrix in High Bandwidth Memory (HBM). Simultaneously, the centroid update stage is heavily penalized by hardware-level atomic write contention caused by irregular, scatter-style token aggregations. To bridge this performance gap, we propose flash-kmeans, an IO-aware and contention-free $k$-means implementation for modern GPU workloads. Flash-kmeans introduces two core kernel-level innovations: (1) FlashAssign, which fuses distance computation with an online argmin to completely bypass intermediate memory materialization; (2) sort-inverse update, which explicitly constructs an inverse mapping to transform high-contention atomic scatters into high-bandwidth, segment-level localized reductions. Furthermore, we integrate algorithm-system co-designs, including chunked-stream overlap and cache-aware compile heuristics, to ensure practical deployability. Extensive evaluations on NVIDIA H200 GPUs demonstrate that flash-kmeans achieves up to 17.9$\times$ end-to-end speedup over best baselines, while outperforming industry-standard libraries like cuML and FAISS by 33$\times$ and over 200$\times$, respectively.

arXiv.org
🌖 開創性預印本平臺 ArXiv 將脫離康乃爾大學,轉型為獨立非營利組織
➤ 面對 AI 浪潮與預算壓力,科學界共享知識的守門人邁向獨立
https://www.science.org/content/article/arxiv-pioneering-preprint-server-declares-independence-cornell
作為科學界最具影響力的預印本平臺,ArXiv 自 1991 年成立以來,長期由康乃爾大學託管。然而,隨著預印本投稿量激增(今年預計突破 30 萬篇)以及「AI 生成垃圾內容」帶來的審核壓力,ArXiv 決定於今年 7 月 1 日正式獨立,轉型為非營利組織。此舉旨在擴大募款基礎,降低對單一學府的依賴,並藉由更具彈性的財政結構,確保這項全球學術基礎設施的長期永續運作。儘管學界對於未來的營運模式與經費來源仍存有疑慮,但此舉被視為平臺應對數位時代挑戰的必然轉型。
+ 終於等到這一天了。康乃爾大學雖然保護了 ArXiv 很長一段時間,但學術平臺的規模早已超越單一大學的管理範疇,獨立運作有利於長遠規劃。
+ 擔心獨
#學術出版 #科學新聞 #ArXiv
Oh look, #ArXiv is playing grown-up and "declaring independence" from Cornell—because who wouldn't trust a site that can’t even load without #JavaScript and #cookies enabled? 😂 Good luck with that rebellious spirit while you're struggling to remember your password. 🔐🎉
https://www.science.org/content/article/arxiv-pioneering-preprint-server-declares-independence-cornell #Independence #Struggles #TechHumor #HackerNews #ngated