So this article is making the rounds and mostly focuses on human aspect of using llms for research.

But there is a more immediate problem. I'm currently in a minor role in an imaging research lab evaluating another's labs output I don't know how else to says this but the software artifacts Just. Don't. Work.

You've got citations based on papers where it is practically impossible to independently evaluate any of the claims.

https://ergosphere.blog/posts/the-machines-are-fine/

The machines are fine. I'm worried about us.

On AI agents, grunt work, and the part of science that isn't replaceable.

@deech As in other areas, LLMs in research mainly amplify pre-existing problems. Unjustified trust in non-verifiable machine-generated information is one of them.

I know of two articles that do talk about this:
- https://doi.org/10.12688/f1000research.5930.2
- https://doi.org/10.1093/comjnl/bxad067

Plus my own thoughts on this topic:
https://metaror.org/article/establishing-trust-in-automated-reasoning-2/

F1000Research Article: Rampant software errors may undermine scientific results.

Read the latest article version by David A. W. Soergel, at F1000Research.

@khinsen What if the thing can't even be built or segfaults on the slightest deviation from test data
@deech That's less of a problem than code that produces wrong results!