Mastodawn

Marcel Salathé Dec 14, 2024

Confidently wrong: No model so far was able to answer this correctly. Not o1 pro, not Gemini advanced, not Claude Opus. The "better" the model, the more confident it was in its wrong answer.

At least Mistral and Claude Sonnet were able to say they didn't know.

This is a real issue. Most of us expect the better models to be more "aware" of possible mistakes. But that does not yet seem to be the case.

Show thread

Michael Szell Dec 14, 2024

@marcelsalathe what was the question? I only see a page from a chopin nocturne in the image.

Show thread

Marcel Salathé

@mszll What the piece was

Show thread

Michael Szell Dec 14, 2024

@marcelsalathe ah, interesting 😆
I guess with some basic pattern recognition and a DB in the background it should be easy to find out algorithmically. With some "intelligence", even if you had never seen this piece but knew how Chopin set up the baseline in his nocturnes (some kind of alberti bass variation with that -8 starting note), it could be straightforward to guess from just one bar.