So this article is making the rounds and mostly focuses on human aspect of using llms for research.

But there is a more immediate problem. I'm currently in a minor role in an imaging research lab evaluating another's labs output I don't know how else to says this but the software artifacts Just. Don't. Work.

You've got citations based on papers where it is practically impossible to independently evaluate any of the claims.

https://ergosphere.blog/posts/the-machines-are-fine/

The machines are fine. I'm worried about us.

On AI agents, grunt work, and the part of science that isn't replaceable.

@deech if you can't run the code in the paper, it should be an immediate rejection.

Build a docker image and link it as reference ffs!

@StompyRobot Unfortunately that's less helpful as one might at first think.

Sure, you can run something in that Docker image. But is it the code described in the paper? Unless the authors have (and publish) an automated pipeline to build the container image from human-readable source code, better don't bet on it.

@deech

@khinsen @deech

Docker images can be reverted to their constituent parts, and if running the code doesn't reproduce the paper, then that's like publishing a paper with a table of data that don't support the conclusion -- immediate grounds for rejection.

@StompyRobot The constituent parts of a Docker image are individual files, but not source code. If the files in a Docker image are not comprehensible, then all they do is provide material evidence for the fact that there exists a computer program that produces the tables in the paper. Which is not a surprise to anyone.

@deech

@khinsen @deech
Most science papers I see are python where source and executable are the same, but for other cases, yes, the docket image needs to be the build system not just the executable!

The point is: industry has good tools to make reproducible artifacts. We should use them where reproducibility is important, like science papers!

@StompyRobot Exactly!

One problem is that good support tools exist but that reproducibility is rarely the default mode of operation. Example: Docker tutorials tend to show Dockerfiles starting with something like "download a Ubuntu image and update all packages". That's not reproducible. Docker images *can* be made reproducible but hardly anyone explains how to do it.

@deech