Really cool test: I got some weird results with Qwen3.5 35B A3B Q4_0 and Q4_M (disappointing tho, it won't pass but I think a Q8 might). I think most LLMs are really designed for webdev (ie not programming). When it comes to reasoning, even LLMs from major companies struggle with some of the questions, and those which do not, use tools like python calculators (it's not cheating to write a one-liner I guess, assuming the LLM can execute it, and get the result..) #LLM
https://www.kaggle.com/blog/standardized-agent-exams

