From a colleague in my dept. - preprint on ArXiv

Can Agents Price a Reaction? Evaluating LLMs on Chemical Cost Reasoning
https://arxiv.org/html/2605.07251v1

#Chemistry #Science #LLM #AI

@SRDas

Um …

What's up with the phenylacetylene structure? Did they use a generative tool to make the figure for their paper about the limits of generative tools???

@stevegis_ssg yeah - I saw that. I'll ask. That "robot" top corner looks AI generated too

@SRDas Leaving aside the figure issues, there is a reason both CAS numbers and SMILES codes (and I guess INCHe keys) exist: to avoid their first problem. Or even using the IUPAC name. That "bromo quinoxaline" name is way too ambiguous and anyone actually placing an order knows it and wouldn't use it written that way.

(were they setting it up to fail? I'm obviously not a fan of llm's but it's better to see them fail for real than to skew the test and have their defenders point out the test was skewed)

@SRLevine I don't know but suspect they may have included that from just some paper. The PI - Oles Isayev and his group are pretty sophisticated in computational chem (and SMILES) and looking into automation etc. - and one of the known issues there is how bad the literature is... Looking into ordering is possibly one of their forays into the start of the automation process. Interestingly I was alerted to the preprint as I was searching for a vendor for "indolizinone" from a paper.

@SRDas Ah, nice! I guess I was worried this was one of those sort of out of field papers where people have no idea what the field actually does/uses and want to "revolutionize" things in ways that don't make sense.

(Early on in my starting at UCINT we had a collaborator [a biologist] come to us with a project they had worked on with a an "AI drug discovery" company and their proposed starting structures included things like isobutanol. It was clear the company had some idea about how "AI" could revolutionize drug discovery and didn't actually hire any comp chemistry folks who had done it before. We had to very gently break it to the biology lab that the AI data set was trash and that someone handing you 26,000 potential hits was not a good thing.)

@SRLevine yeah that's the thing and why I like Oles and group doing things - they have the chem background (as also why he's wildly successful) - a lot (most?) lab automation and AI is being done by robotics and CS people with little practical knowledge of the way science, chemistry is done (which has its own baggage and cans of worms)