Evaluation of open-source AI frameworks demonstrates that these systems are limited to conceptual planning and text generation, without the capacity to carry out full research cycles.
https://www.biorxiv.org/content/10.64898/2026.01.05.697809v1
Evaluation of open-source AI frameworks demonstrates that these systems are limited to conceptual planning and text generation, without the capacity to carry out full research cycles.
https://www.biorxiv.org/content/10.64898/2026.01.05.697809v1
"We initially assumed the AI Scientist could autonomously conduct research based solely on a
prompt. However, it requires a user-defined “template,” which significantly limits the autonomy
of the AI Scientist."
A major step toward Artificial General Intelligence (AGI) and Super Intelligence is AI's ability to autonomously conduct research - what we term Artificial Research Intelligence (ARI). If machines could generate hypotheses, conduct experiments, and write research papers without human intervention, it would transform science. Sakana recently introduced the 'AI Scientist', claiming to conduct research autonomously, i.e. they imply to have achieved what we term Artificial Research Intelligence (ARI). The AI Scientist gained much attention, but a thorough independent evaluation has yet to be conducted. Our evaluation of the AI Scientist reveals critical shortcomings. The system's literature reviews produced poor novelty assessments, often misclassifying established concepts (e.g., micro-batching for stochastic gradient descent) as novel. It also struggles with experiment execution: 42% of experiments failed due to coding errors, while others produced flawed or misleading results. Code modifications were minimal, averaging 8% more characters per iteration, suggesting limited adaptability. Generated manuscripts were poorly substantiated, with a median of five citations, most outdated (only five of 34 from 2020 or later). Structural errors were frequent, including missing figures, repeated sections, and placeholder text like 'Conclusions Here'. Some papers contained hallucinated numerical results. Despite these flaws, the AI Scientist represents a leap forward in research automation. It generates full research manuscripts with minimal human input, challenging expectations of AI-driven science. Many reviewers might struggle to distinguish its work from human researchers. While its quality resembles a rushed undergraduate paper, its speed and cost efficiency are unprecedented, producing a full paper for USD 6 to 15 with 3.5 hours of human involvement, far outpacing traditional researchers.
ИИ-ученые 2025: SR-Scientist, DeepEvolve и Kosmos — чем отличаются и зачем. И почему выстрелил Kosmos
За год ИИ-Ученые выросли из демо в рабочие инструменты: одни вынимают законы из данных, другие эволюционируют код под бенчмарки, третьи связывают литературу и анализ в проверяемые отчеты. Разбираем 3 характерных подхода: SR-Scientist , DeepEvolve и Kosmos , для чего они нужны и в чем различны. И почему именно вокруг Kosmos столько шума.
https://habr.com/ru/articles/964254/
#нейросети #AI_scientist #искусственный_интеллект #ИИученые #agentic_ai #автономные_агенты #Kosmos #Edison_Scientific #world_models #symbolic_regression
В 2025-м на наших глазах складывается новый класс инструментов - ИИ-Ученые (AI-Scientist) . Если раньше алгоритмы ИИ могли только генерировать идеи или перерабатывать уже известные решения, то...
認識論的徳からトップダウンに知識を定義する徳認識論において、人工知能などの非人間的存在の「知識」を「知識」と言えるのかという疑問がある。たとえばSosaのAAA構造による知識の定義を採るならば、その非人間的存在にも認識的な能力と信念保持能力を認める必要があるが、非人間的存在がこれらを保持できるかどうかは明らかではない。それに加えて、AAA構造は知識の価値をその定義に組み込むことによって、学習物理学等の人工知能科学における知識を説明出来なくなるおそれがある。ある信念が知識であるために信念を生む主体の徳が必要であるならば、徳を持つとは考えにくい深層学習モデルの出力する文(信念?)は知識でなくなるだろう。
#epistemology #philosophy_of_AI #AI_scientist #open_question