Mastodawn

"You go to sleep, the agent does the science and when you wake up, you have the results."

That's not how any of this works. You're not "doing research with help" you have something generated that looks like a research report. Without the research happening. How many different ways can "AI" bros find to express "I AM TOTALLY MISSING THE POINT"?

Show thread

Wulfy—Speaker to the machines Mar 10

@tante

The latest models are largely written by other models.

The goal of #OpenAi is not to develop #AGI...
...but to develop an AI, AI researcher that can then develop AGI

Most frontier models are at about 40% on Humanities Last Exam, they sat on about 3% when launched. They can defo do zero shot knowledge

TLDR; AI can do research.

Show thread

tante

@n_dimension Models are not "written", they are "trained". That is very much different. And sure frontier models are good at standard tests. It's basically open book testing.

Show thread

Wulfy—Speaker to the machines Mar 10

@tante

Written not trained.
#vibecoding
https://www.hyperdimensional.co/p/on-recursive-self-improvement-part

Humanity's last exam is not open book, it's been expressly designed to exclude Ai available datasets and centres around specific domain knowledge. Check it out.

On Recursive Self-Improvement (Part I)

Thoughts on the automation of AI research

Hyperdimensional

Show thread

Bogdan Buduroiu Mar 10

@tante @n_dimension ARC-AGI-2 tests are specifically designed so that no amount of memorization can help a model finish the task, it has a pass@2 metric, so exhaustive search can't be used as a strategy.

I suggest you try them, humans should be able to easily solve all of the tests since they're a measure of fluid intelligence, and Gemini 3.1 Pro solves 77.1% of the tests