Mastodawn

"You go to sleep, the agent does the science and when you wake up, you have the results."

That's not how any of this works. You're not "doing research with help" you have something generated that looks like a research report. Without the research happening. How many different ways can "AI" bros find to express "I AM TOTALLY MISSING THE POINT"?

Show thread

Kevin Karhan

Mar 10

@tante YES!

Show thread

Paul Rietschka Mar 10

@tante All the sub-troglodyte intellects in academia showing themselves with the help of AI.

Show thread

David Zellhöfer Mar 10

@tante who wrote that? it’s silly.

Show thread

mmby Mar 10

@tante did the correlation machine find some causation?

Show thread

Lars Wirzenius Mar 10

@tante 25 years ago I saw the number 12765 inside the cap of a soda bottle. Part of a game. Seems like as good an answer as any.

Show thread

Wulfy—Speaker to the machines Mar 10

@tante

The latest models are largely written by other models.

The goal of #OpenAi is not to develop #AGI...
...but to develop an AI, AI researcher that can then develop AGI

Most frontier models are at about 40% on Humanities Last Exam, they sat on about 3% when launched. They can defo do zero shot knowledge

TLDR; AI can do research.

Show thread

tante Mar 10

@n_dimension Models are not "written", they are "trained". That is very much different. And sure frontier models are good at standard tests. It's basically open book testing.

Show thread

Wulfy—Speaker to the machines Mar 10

@tante

Written not trained.
#vibecoding
https://www.hyperdimensional.co/p/on-recursive-self-improvement-part

Humanity's last exam is not open book, it's been expressly designed to exclude Ai available datasets and centres around specific domain knowledge. Check it out.

On Recursive Self-Improvement (Part I)

Thoughts on the automation of AI research

Hyperdimensional

Show thread

Bogdan Buduroiu Mar 10

@tante @n_dimension ARC-AGI-2 tests are specifically designed so that no amount of memorization can help a model finish the task, it has a pass@2 metric, so exhaustive search can't be used as a strategy.

I suggest you try them, humans should be able to easily solve all of the tests since they're a measure of fluid intelligence, and Gemini 3.1 Pro solves 77.1% of the tests

Show thread

Bilal Barakat 🍉Mar 10

@n_dimension @tante

Ha ha ha ha ha

Show thread

Max Mar 10

@tante why don't they just have an agent come up with a way of missing the point? Such a missed opportunity.

Show thread

tante Mar 10

@korenchkin they probably are

Show thread

vanecx Mar 10

@tante but if AI says it has done it, then it must have done it, right?

https://infosec.exchange/@paco/115151810444789847

Paco Hope (@[email protected])

Attached: 1 image One of the ways that LLM-authored code improves productivity is by merely SAYING it does things. It's way faster than the whole time-consuming process of actually doing things. This is real code someone sent to me for review.

Infosec Exchange

Show thread

no brain no pain Mar 10

@tante as there is no upper border to stupidity, there’s none to the counter of ways those morons will create…

Show thread

Bogdan Buduroiu Mar 10

@tante a lot of science is tweaking hyperparameters and seeing if it changed anything.

We, as humans, sunk thousands and tens of thousands of hours into algorithms that help us with the tasks I described above, to avoid having to do exhaustive grid search. Hyperband is one such algorithm, it's literally "you go to sleep, and the algorithm does science for you". I don't see the difference here

> AlphaEvolve discovered novel, provably correct algorithms that surpass state-of-the-art solutions on a spectrum of problems in mathematics and computer science, significantly expanding the scope of prior automated discovery methods (Romera-Paredes et al., 2023). Notably, AlphaEvolve developed a search algorithm that found a procedure to multiply two complex-valued matrices using scalar multiplications; offering the first improvement, after 56 years, over Strassen's algorithm in this setting. We believe AlphaEvolve and coding agents like it can have a significant impact in improving solutions of problems across many areas of science and computation.

https://sakana.ai/shinka-evolve/

https://arxiv.org/abs/2506.13131

Sakana AI

ShinkaEvolve: Evolving New Algorithms with LLMs, Orders of Magnitude More Efficiently

Show thread

Pete Gontier Mar 10

@tante It seems to me one could still benefit from this workflow as long as one treats the LLM like a prosecutor treats a flipped mobster. Sure, a mobster will feel no compunctions about lying or making shit up. That’s why you investigate every little thing he says and make sure you can establish its truth or falsity independent of his words.

Show thread

slotos Mar 11

@tante Tech managers had been doing this before AI and will keep on doing it after. The motto is „sell easy solution to a hard problem and let the next person deal with the consequences”.

Show thread

🦩 Plastic Garden Fauna 🦩Mar 11

@tante As many ways as their jobs depend upon.

Show thread

Tock Mar 11

@tante There's a reponse for that, too. Because the presumption is our brains are so smooth their lies solve cognitive dissonance.

Show thread

700Sachen Mar 11

@tante yes, in generally speaking that is not how research works. But: much of what researchers do can benefit from automation. Researchers write models, code, design experiments and then test them, redesign and so on. Thinking about how much time of my economic research I have spent on writing code and searching for errors, aggregating data, finding patterns in data and so on. It could have saved me much time. Even without “having the results” next morning.

Show thread

700Sachen Mar 11

@tante and: the same is true for humanities. Actually pattern recognition is part of a lot of different research areas. So if it’s not about having a finished report, but having the next step in your research production process, there is truth in that sentence.

Show thread

Anthony Mar 11

@[email protected] If I had the time or inclination I'd flesh out and formalize this argument that an "automated scientist" would be indistinguishable from a random number generator eventually because no person would be able to understand what it's outputting after it runs long enough.

Anthony (@[email protected])

I'm tinkering with an argument based on algorithmic complexity that if it were possible to make something like an "automated mathematician" or "automated scientist", then these would be expected to eventually produce outputs that we humans would be unable to distinguish from random noise. Getting the whole argument just right is fiddly, but the basic idea is this. You feed some kind of theory into the AM/AS, which is a black box. It churns on this and spits out a result, which is added to the theory (I'm neglecting

buc.ci