"You go to sleep, the agent does the science and when you wake up, you have the results."

That's not how any of this works. You're not "doing research with help" you have something generated that looks like a research report. Without the research happening. How many different ways can "AI" bros find to express "I AM TOTALLY MISSING THE POINT"?

@tante All the sub-troglodyte intellects in academia showing themselves with the help of AI.
@tante who wrote that? it’s silly.
@tante did the correlation machine find some causation?
@tante 25 years ago I saw the number 12765 inside the cap of a soda bottle. Part of a game. Seems like as good an answer as any.

@tante

The latest models are largely written by other models.

The goal of #OpenAi is not to develop #AGI...
...but to develop an AI, AI researcher that can then develop AGI

Most frontier models are at about 40% on Humanities Last Exam, they sat on about 3% when launched. They can defo do zero shot knowledge

TLDR; AI can do research.

@n_dimension Models are not "written", they are "trained". That is very much different. And sure frontier models are good at standard tests. It's basically open book testing.

@tante

Written not trained.
#vibecoding
https://www.hyperdimensional.co/p/on-recursive-self-improvement-part

Humanity's last exam is not open book, it's been expressly designed to exclude Ai available datasets and centres around specific domain knowledge. Check it out.

On Recursive Self-Improvement (Part I)

Thoughts on the automation of AI research

Hyperdimensional

@tante @n_dimension ARC-AGI-2 tests are specifically designed so that no amount of memorization can help a model finish the task, it has a pass@2 metric, so exhaustive search can't be used as a strategy.

I suggest you try them, humans should be able to easily solve all of the tests since they're a measure of fluid intelligence, and Gemini 3.1 Pro solves 77.1% of the tests

@tante why don't they just have an agent come up with a way of missing the point? Such a missed opportunity.
@korenchkin they probably are

@tante but if AI says it has done it, then it must have done it, right?

https://infosec.exchange/@paco/115151810444789847

Paco Hope (@[email protected])

Attached: 1 image One of the ways that LLM-authored code improves productivity is by merely SAYING it does things. It's way faster than the whole time-consuming process of actually doing things. This is real code someone sent to me for review.

Infosec Exchange
@tante as there is no upper border to stupidity, there’s none to the counter of ways those morons will create…

@tante a lot of science is tweaking hyperparameters and seeing if it changed anything.

We, as humans, sunk thousands and tens of thousands of hours into algorithms that help us with the tasks I described above, to avoid having to do exhaustive grid search. Hyperband is one such algorithm, it's literally "you go to sleep, and the algorithm does science for you". I don't see the difference here

> AlphaEvolve discovered novel, provably correct algorithms that surpass state-of-the-art solutions on a spectrum of problems in mathematics and computer science, significantly expanding the scope of prior automated discovery methods (Romera-Paredes et al., 2023). Notably, AlphaEvolve developed a search algorithm that found a procedure to multiply two complex-valued matrices using scalar multiplications; offering the first improvement, after 56 years, over Strassen's algorithm in this setting. We believe AlphaEvolve and coding agents like it can have a significant impact in improving solutions of problems across many areas of science and computation.

https://sakana.ai/shinka-evolve/

https://arxiv.org/abs/2506.13131

Sakana AI

ShinkaEvolve: Evolving New Algorithms with LLMs, Orders of Magnitude More Efficiently

@tante It seems to me one could still benefit from this workflow as long as one treats the LLM like a prosecutor treats a flipped mobster. Sure, a mobster will feel no compunctions about lying or making shit up. That’s why you investigate every little thing he says and make sure you can establish its truth or falsity independent of his words.
@tante Tech managers had been doing this before AI and will keep on doing it after. The motto is „sell easy solution to a hard problem and let the next person deal with the consequences”.
@tante As many ways as their jobs depend upon.
@tante There's a reponse for that, too. Because the presumption is our brains are so smooth their lies solve cognitive dissonance.
@tante yes, in generally speaking that is not how research works. But: much of what researchers do can benefit from automation. Researchers write models, code, design experiments and then test them, redesign and so on. Thinking about how much time of my economic research I have spent on writing code and searching for errors, aggregating data, finding patterns in data and so on. It could have saved me much time. Even without “having the results” next morning.
@tante and: the same is true for humanities. Actually pattern recognition is part of a lot of different research areas. So if it’s not about having a finished report, but having the next step in your research production process, there is truth in that sentence.
@[email protected] If I had the time or inclination I'd flesh out and formalize this argument that an "automated scientist" would be indistinguishable from a random number generator eventually because no person would be able to understand what it's outputting after it runs long enough.



Anthony (@[email protected])

I'm tinkering with an argument based on algorithmic complexity that if it were possible to make something like an "automated mathematician" or "automated scientist", then these would be expected to eventually produce outputs that we humans would be unable to distinguish from random noise. Getting the whole argument just right is fiddly, but the basic idea is this. You feed some kind of theory into the AM/AS, which is a black box. It churns on this and spits out a result, which is added to the theory (I'm neglecting

buc.ci