Eventually, Qwen3.5 35B A3B Q4_M thinking got 87.5% in 27 mins at mock up SAE exam using llama.cpp WebUI, thus PASS (just the list of same questions + verification by itself, and then mine).
Now, what's funny is that Sonnet 4.6 (Extended ie Thinking) falls into the same pitfalls on the same questions as Qwen3.5 35B A3B Q4_M non-thinking 🤯
#Alibaba #Qwen #anthropic #sae #LLM #kaggle #AIsafety
©️ Nicolas Mouart, 2018-2026







