AI systems sometimes present fiction as fact, a phenomenon known as AI hallucinations. Using such outputs can spread false information, damage reputations, and create other problems ...

https://doi.org/10.13140/RG.2.2.33179.53285

#AIBenchmarks #AIHallucinations #AIResearch #AISafety #AI

Alibaba's new Qwen3.5‑9B beats OpenAI's gpt‑oss‑120B in laptop‑level benchmarks, delivering similar quality with a fraction of the compute. This shows efficient open‑source LLMs can bring powerful AI to everyday hardware. Want the full numbers and insights? Dive in. #Qwen3_5_9B #OpenAI #AIbenchmarks #LaptopInference

🔗 https://aidailypost.com/news/alibabas-qwen35-9b-outperforms-openais-gpt-oss-120b-laptop-benchmarks

Chetaslua (@chetaslua)

MiniMax M2.5가 가격의 1/10 수준에서 모든 모델보다 우수하다는 강한 주장. MiniMax(약 10B 활성 파라미터)의 가성비와 성능 우수성을 강조하며, Opus 4.5와 4.6 비교를 언급하는 내용이다.

https://x.com/chetaslua/status/2027804004017967480

#minimax #languagemodel #modelcomparison #aibenchmarks

Chetaslua (@chetaslua) on X

MiniMax M2.5 better than every model at 1/10th of price Wtf @MiniMax_AI you guys cooked for real , when new model (minimax is better for its price and 10B activated parameter) One thing to see opus 4.5 topped this opus 4.6 worse than this .

X (formerly Twitter)
OpenAI retired SWE-bench Verified. 59.4% of tasks were flawed. Top models were recalling answers from memory, not solving problems. AdwaitX breaks down what this benchmark crisis means for AI in 2026. #AdwaitX #AIBenchmarks #OpenAI
https://www.adwaitx.com/openai-swe-bench-verified-retired-ai-benchmarks/
OpenAI Drops SWE-bench Verified: What It Means for AI

Discover why OpenAI retired SWE-bench Verified on AdwaitX. Expert analysis reveals what this benchmark shift means for AI coding agents in 2026.

AdwaitX