LLMs predict my coffee

Why not benchmark with physical experiments?

DYNOMIGHT