Taylorism is a management philosophy based on using scientific optimization to maximize labor productivity and economic efficiency.

Here's the result of making the false Taylorist assumption that the output of scientific research is scientific papers—the more, faster, and cheaper, the better.

Papers are not the output of scientific research in the way that cars are the output of automobile manufacturing.

Papers are merely a vehicle through which a portion of the output of research is shared.

We confuse the two at our peril.

The entire idea of outsourcing the scientific ecosystem to LLMs — as described below — is a concept error that I can scarcely begin to get my head around.

sakana.ai/ai-scientist/

"While there are still occasional flaws in the papers produced by this first version..."

Meanwhile the authors note that the output itself fails to meet standards of scientific rigor, but treat this as a minor wrinkle, not a fundamental barrier imposed by using the wrong tool for the wrong job.

This system literally fabricates its methods section — an act which goes beyond bad science into the realm of serious scientific misconduct. This is more than a wrinkle to be ironed out.

Scientists: We need to slow down the publication race and produce higher quality papers at a slower rate to make the literature manageable again.

Engineers: We hear you. Now every lab in the world will be able to produce hundreds of medium-quality papers (with a few mistakes in each) every week.

I do appreciate the authors' candor in detailing failure modes.

A system that makes difficult-to-catch mistakes in implementation, fails to compare quantitative data appropriately, and fabricates entire results—maybe I have high standards but I don't see this as writing "medium-quality" papers.

Here's the weird Taylorism again. The system produces work at the level of an early trainee requiring substantive supervision. This is not good ROI for producing papers.

The primary output of time invested in trainee research is the development of independent scientists—not the research papers.

In the end, how one judges this paper probably comes down to how one assesses the claim that is always used to justify this kind of work.

The authors "believe" that future versions will be
greatly improved.

Given what I know of fundamental limits to what LLMs can do, I see no reason to agree.

When I fail to do something, I either don't publish or very occasionally I publish describing that failure. When I do so, I don't pretend it was a success and promise that it'll magically get better.

@ct_bergstrom
Is the AI scientist is being categorised, by the language used, as a responsible entity?

Is there an explicit statement that a human has been given authority over, and responsibility for, the actions of the AI scientist.

#AIWhereIsResponsibility

EDIT: I answered my own question.
Neither "responsibility" nor "responsible" are found in the paper.

smh

arxiv.org/abs/2408.06292

@ct_bergstrom

I appreciate your full analysis, but honestly, if I were reading the paper, I would have stopped once I realized the name was always set in small caps with some funky spacing. It reeks of marketing.

@ct_bergstrom
> 'The authors "believe" that future versions will be greatly improved.'

I'm wondering if there's a name for this effect - where a machine does something resembling a human activity, and people ascribe it further human qualities - then assume it will advance in ability in a similar manner to a human.

See also: self-driving cars. Shuffling mindlessly around a car park? Great! It'll soon be driving perfectly.

Or not.

@coprolite9000 @ct_bergstrom
You mean The Eliza Effect?

@bornach @coprolite9000 @ct_bergstrom

> The Eliza Effect

How does that make you feel?

@coprolite9000 @ct_bergstrom I believe the closest we have is a mixture of automation bias and cargo cult science.
@coprolite9000 @ct_bergstrom I agree skepticism is in order about the pace and extent of progress. But this expectation seems reasonable to me in the abstract if we are iterating and can always use the best model to date.
For example, I can’t imagine how computers could ever get worse at playing chess than they are now, and as long as we iterate new approaches, there’s a possibility of improvement.
@Spring @coprolite9000 @ct_bergstrom
FWIW it's quite common for performance of machine learning systems to degrade rather than improve. Ensuring that performance DOES improve typically takes a lot of human work.

@FeralRobots @Spring @coprolite9000 @ct_bergstrom
And then there is the problem of an AI that has defeated the top-level human players is somehow still beatable using a strategy that only an intermediate level player would consider deploying

https://youtu.be/l7tWoPk25yU

https://arstechnica.com/information-technology/2023/02/man-beats-machine-at-go-in-human-victory-over-ai/

- YouTube

Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.

@coprolite9000 @ct_bergstrom
In #Robotics, it's (reasonably) well-known that people will ascribe human cognitive abilities and intent to absolutely anything.

Roboticists are particularly bad about this (see linked paper below).

I think the optimism that it will get better is not just universal but required for publication and grant approval.

Link: https://www.sciencedirect.com/science/article/abs/pii/S0921889013001863

Perception of own and robot engagement in human–robot interactions and their dependence on robotics knowledge

Communication between socially assistive robots and humans might be facilitated by intuitively understandable mechanisms. To investigate the effects o…

@ct_bergstrom @coprolite9000 You’re thinking about “first step fallacies.” Hubert Dreyfus wrote about it some time ago https://link.springer.com/article/10.1007/s11023-012-9276-0
A History of First Step Fallacies - Minds and Machines

In the 1960s, without realizing it, AI researchers were hard at work finding the features, rules, and representations needed for turning rationalist philosophy into a research program, and by so doing AI researchers condemned their enterprise to failure. About the same time, a logician, Yehoshua Bar-Hillel, pointed out that AI optimism was based on what he called the “first step fallacy”. First step thinking has the idea of a successful last step built in. Limited early success, however, is not a valid basis for predicting the ultimate success of one’s project. Climbing a hill should not give one any assurance that if he keeps going he will reach the sky. Perhaps one may have overlooked some serious problem lying ahead. There is, in fact, no reason to think that we are making progress towards AI or, indeed, that AI is even possible, in which case claiming incremental progress towards it would make no sense. In current excited waiting for the singularity, religion and technology converge. Hard headed materialists desperately yearn for a world where our bodies no longer have to grow old and die. They will be transformed into information, like Google digitizes old books, and we will achieve the promise of eternal life. As an existential philosopher, however, I suggest that we may have to overcome the desperate desire to digitalize our bodies so as to achieve immortality, and, instead, face up to and maybe even enjoy our embodied finitude.

SpringerLink

@ct_bergstrom

This proposal reveals a blatant lack of understanding of how LLMs work.

https://chaosfem.tw/@theogrin/112957246097616591

Jennifer Kayla | Theogrin 🦊 (@[email protected])

Just a reminder that LLMs have never provided actual answers to any question asked of them or any actual prompt set forth. They have, however, provided *answer-shaped responses*, and we as humans are seriously lacking when it comes to telling one from the other. Folks, a banana-shaped piece of wood is not edible, and I am embarrassed for our species that you cannot tell the difference.

Chaosfem
@feliz @ct_bergstrom or, they understand LLMs just fine but see no problem with replacing science with plausible-sounding bullshit.
@ct_bergstrom
They seem to want a future where AI produces reams of garbage, while scientists are only there to load paper into the printers.
@ct_bergstrom thanks for this thread. From an outsiders perspective (not an expert on llm and as an architect far from scientific writing) this looks like a bland and lazy scam.
I am bewildered by those trying to sell aproximation machines to do specific specialized tasks, forgetting or waving off that all communication, from scientific paper to any form of art, is only really relevant because of the humans who made it, their circumstances, experience, collaboration, etc.

@ct_bergstrom and the judges will be the already-strained academics doing unpaid peer review in an already-strained system. The big commercial publishers will be able to afford the detection tools and learn to use them effectively, maybe. The little journals will be swamped.

Science fiction short story publisher Clarkesworld almost went under from the burden of fraudulent LLM-output submissions. https://neil-clarke.com/a-concerning-trend/
Is academic publishing ready?

A Concerning Trend – Neil Clarke

@econoprof

I know this is a rhetorical question

But the answer is "no", in case anyone reading wasn't sure
@ct_bergstrom

@ct_bergstrom ...besides all your valid points, what remains is that the world will be flooded with mediocre paper/content in various disciplines, good enough to pass a superficial check but flawed enough to do harm. AI algorithms will play a role in writing papers, reports etcetera, but how to do that in a sensible and responsible way remains a question for me. Another demo in this field is https://storm.genie.stanford.edu/
Spurious Scholar

Spurious research papers based on real correlations with p < 0.05, generated by a large language model.

@mvaudel @ct_bergstrom
Vigen's reminding me of Alfred Kroeber's infamous paper mapping the correlation between stock market performance and dress hemlines - which while absolutely satire, has nevertheless been taken with deadly seriousness in the century or so since.
What a dead salmon reminds us about fMRI analysis | Stanford Law School

This has been making the rounds in the neuroscience world, but deserves attention in cross-disciplinary fields.  A group of top-notch fMRI researcher

Stanford Law School
@ct_bergstrom More research (funding) needed.
@ct_bergstrom Turning papers into (green) paper.
@ct_bergstrom You spend your life trying to do the best science you can, and along comes this... abomination. Makes me sick.
@ct_bergstrom And what's worse, the only way in which it will get "better" is better at concealing its academic misconduct.
@dalias @ct_bergstrom
This looks a lot like human behavior to me ...

@vincib @ct_bergstrom Maybe but the difference is that humans face consequences.

(Billionaires are not humans, btw)

@ct_bergstrom While I personally agree with you that

"The primary output of time invested in trainee research is the development of independent scientists"

I think that you'll probably find that this is a minority view, both amongst those doing the training and those funding it, at least in some fields.

I tend to find that even the most progressive funders and supervisors believe that "PhD students are the backbone of the research workforce".

@ct_bergstrom Thanks a lot for making explicit many aspects of what is - to say the least - worrying about this work.

A similar aspect to the point you make about "developing independent scientists", and related to many points that @emilymbender is frequently making, stems from the perspective of human competence: doing research is how individual researchers learn *themselves* about the domain, doing research etc. There is no way of letting someone else do A + then being able to do A yourself.

@ct_bergstrom @emilymbender

As a society, then I'd argue that we want _people_ who can do research. 1) Because of their singularly human way of experiencing the world (as opposed to e.g., how bats or AIs ... etc. experience the world) and 2) Because humans are also different we'd like to have diversity in human researchers, not just the single human researcher who can still do it.

@ct_bergstrom Maybe I'm reading too much into this, but this makes me worried about these folks' opinions of others' work. Like do they think the quality of papers in general is so bad that what sounds rather objectively bad ranks as "medium-quality"?

I grant I'm not an academic or in the habit of reading scholarly papers, and I recognize the quality of papers in general could be poor

@jadonn @ct_bergstrom Yes. Because that's the quality of shit they got by with in college because they were legacy admissions and members of the right frats. They like have utterly no idea that there's such a thing as legitimate work and that most people aim to do that rather than maximizing bullshit.

@ct_bergstrom I came into this thread at this toot and thought #TheAIScientist was the average person working in #AI

In the authors' defense, research in AI is not scientifically rigorous and makes a great deal of unsubstantiated claims (such as by using heat maps to infer meaning into what parts of a computer vision model are responding to)

Perhaps The AI Scientist has simply learned from the people training it?

@ct_bergstrom sorry, this is phrased unfairly as there are many people working in "AI" whom I respect greatly. However, I'm not going to rephrase my original toot, because it was my initial thought, and I'm still exhausted by all the #AIGuff that is produced, daily.
@ct_bergstrom I would agree that this is no where near "medium-quality". It doesn't even reach the lowest rung on the quality ladder imo.
@ct_bergstrom That last bit about using it to generate promising ideas is quite sad. I thought that was what talking to other scientists was for!

@ct_bergstrom There shouldn't really be "medium-quality papers" anyway: If your research is worthwhile, methodologically sound and rigorous, it's world class science, period.

If it fails to meet any of the standard, it's rubbish. In the world of h-indices and publish or perish, we are - intentionally or not - reducing "research" to cargo-culting.

I deeply despise every part of it.

@ftranschel @ct_bergstrom indeed! Publication quantity is a flawed metric of the merit of research, and the potential for an LLM to game this system is a further condemnation of the system, not a triumph for AI.
@ct_bergstrom well, you see, soon these papers will be of average quality by virtue of sheer volume.
@ct_bergstrom one can only imagine what a poor-quality paper would be in their view