Omar Sanseviero (@osanseviero)

LlamaIndex Org의 ParseBench 벤치마크가 Kaggle 리더보드에 공개되었고, 관련 arXiv 논문도 함께 소개되었다. 문서 파싱/추출 성능을 평가하는 새로운 오픈소스 평가 자료로 보이며, AI 개발자에게 유용한 벤치마크 업데이트다.

https://x.com/osanseviero/status/2048777804293067023

#llamaindex #kaggle #benchmark #arxiv #opensource

Omar Sanseviero (@osanseviero) on X

Kaggle Leaderboard: https://t.co/tpCVJ2wvrx Paper: https://t.co/SrbISQ3A5L Blog: https://t.co/LHhndtTg9j

X (formerly Twitter)

fly51fly (@fly51fly)

강화학습에서 가치 기반 샘플링을 활용해 학습 속도를 높이는 FASTER 방법이 소개됐다. 빠른 RL을 목표로 한 개선안으로, 샘플 효율성과 학습 가속 측면에서 실용성이 높아 보인다.

https://x.com/fly51fly/status/2048516642603483635

#reinforcementlearning #sampling #fastrl #machinelearning #arxiv

fly51fly (@fly51fly) on X

[LG] FASTER: Value-Guided Sampling for Fast RL P Dong, A Swerdlow, D Sadigh, C Finn [Stanford University] (2026) https://t.co/km8po6K0bo

X (formerly Twitter)

fly51fly (@fly51fly)

OpenFold3에서 단백질 구조의 입체형태를 제어하기 위해 잠재변수 기반의 conformational control을 제안한 연구다. 구조 예측·생성 모델의 제어 가능성을 높이는 내용으로, 단백질 모델링과 생명과학 AI 응용에 중요하다.

https://x.com/fly51fly/status/2048518363065069648

#openfold3 #proteinfolding #structuralbiology #machinelearning #arxiv

fly51fly (@fly51fly) on X

[LG] ConforNets: Latents-Based Conformational Control in OpenFold3 M Lee, C Kalicki, M Jeon, A Qabel… [Columbia University & Princeton University] (2026) https://t.co/VnsEPAyTIx

X (formerly Twitter)

fly51fly (@fly51fly)

다양한 언어 모델이 숫자 표현을 학습하는 방식이 서로 달라도 유사한 표현을 형성한다는 ‘Convergent Evolution’ 연구가 소개됐다. 언어 모델 내부 표현의 공통 메커니즘을 이해하는 데 중요한 결과로, 모델 해석과 일반화 연구에 의미가 있다.

https://x.com/fly51fly/status/2048520006900895932

#llm #languagemodels #representationlearning #arxiv #machinelearning

fly51fly (@fly51fly) on X

[CL] Convergent Evolution: How Different Language Models Learn Similar Number Representations D Fu, T Zhou, M Belkin, V Sharan… [University of Southern California & UC San Diego] (2026) https://t.co/zoXVMn8cYL

X (formerly Twitter)

Today on the #arXiv :

@elizabethtasker et al. 2026, "The science from asteroid sample return missions" - https://arxiv.org/abs/2604.22182

Reviewing what has been learned from bringing samples from Itokawa, Ryugu, and Bennu back to Earth.

The science from asteroid sample return missions

To date, three samples from near-Earth asteroids have been delivered to Earth by Japan's Hayabusa (2010) and Hayabusa2 (2020) missions, and the United States OSIRIS-REx mission (2023). Free from terrestrial contamination, these pristine materials provide new opportunities to investigate planetary formation processes, the delivery of organics and water to the early Earth, and the nature of potentially hazardous asteroids. As analysis of the asteroid samples proceeds in laboratories around the world, we visit each of the missions, review the initial scientific findings, and explore the value of sample return in understanding our origins and protecting our future.

arXiv.org
Update. _Nature_ is also covering this news, drawing from the #Zenodo preprint of the #arXiv preprint.
https://www.nature.com/articles/d41586-026-01340-y
How much for a fake authorship? Ad database reveals secrets of scientific fraud

An analysis of thousands of paper-mill adverts could help journals to crack down on misconduct.

Update. Here's a #Zenodo preprint of the #arXiv preprint.
https://zenodo.org/records/19684278

"A preprint describing this dataset has been submitted to arXiv. This entry will be updated as soon as arXiv's moderation process is complete."

BuyTheBy - An annotated dataset of paper mill advertisements with price data

A preprint describing this dataset has been submitted to arXiv. This entry will be updated as soon as arXiv's moderation process is complete. The study of paper mills and similar businesses operating in the market for academic and education fraud services is frustrated by the lack of market price data on their various offerings. Here, we assemble BuyTheBy, a large, annotated dataset of timestamped, text-based paper mill advertisements from seven businesses operating out of seven different countries. The dataset consists of 18,710 individual advertisements, of which 15,839 have prices listed. Among these there are 20,598 positions listed as for sale on 5,567 unique products in 14 different product categories with 51,812 timestamped price data points. Code for reproducing figures and summary statistics is available at https://github.com/reeserich/buytheby.

Zenodo

"Thousands of shady ads sell paper authorship for cash, large-scale investigation finds."
https://www.science.org/content/article/thousands-shady-ads-sell-paper-authorship-cash-large-scale-investigation-finds

PS: This article from _Science_ reports important news about paper mills. But apart from that, note that it draws from an #arXiv preprint that hasn't been released yet. It's a preprint preprint, and from _Science_. A nice example of the ongoing #ScholComm transformation.

#Misconduct #PaperMills

There Will Be a Scientific Theory of Deep Learning

https://arxiv.org/abs/2604.21691

#arxiv

There Will Be a Scientific Theory of Deep Learning

In this paper, we make the case that a scientific theory of deep learning is emerging. By this we mean a theory which characterizes important properties and statistics of the training process, hidden representations, final weights, and performance of neural networks. We pull together major strands of ongoing research in deep learning theory and identify five growing bodies of work that point toward such a theory: (a) solvable idealized settings that provide intuition for learning dynamics in realistic systems; (b) tractable limits that reveal insights into fundamental learning phenomena; (c) simple mathematical laws that capture important macroscopic observables; (d) theories of hyperparameters that disentangle them from the rest of the training process, leaving simpler systems behind; and (e) universal behaviors shared across systems and settings which clarify which phenomena call for explanation. Taken together, these bodies of work share certain broad traits: they are concerned with the dynamics of the training process; they primarily seek to describe coarse aggregate statistics; and they emphasize falsifiable quantitative predictions. We argue that the emerging theory is best thought of as a mechanics of the learning process, and suggest the name learning mechanics. We discuss the relationship between this mechanics perspective and other approaches for building a theory of deep learning, including the statistical and information-theoretic perspectives. In particular, we anticipate a symbiotic relationship between learning mechanics and mechanistic interpretability. We also review and address common arguments that fundamental theory will not be possible or is not important. We conclude with a portrait of important open directions in learning mechanics and advice for beginners. We host further introductory materials, perspectives, and open questions at learningmechanics.pub.

arXiv.org
🔍🤔 Oh no! A brave soul has confirmed the alarming #decline of #arXiv papers on Hacker News! 🎓📰 Quick, someone #alert the #academia police—our #intellectual #sanctuary is imploding! 🙄🚨
https://dylancastillo.co/til/llm-research-on-hacker-news-is-dying.html #news #HackerNews #ngated
LLM research on Hacker News is drying up – Dylan Castillo

Dylan Castillo