Mastodawn

Bindu Reddy (@bindureddy)

Opus 4.5가 개편된 LiveBench 리더보드에서 1위를 차지했습니다. LiveBench는 연휴 기간에 게임화를 방지하기 위해 재구성되었고, Opus 4.5가 최상위를 기록했으며 Codex와 Gemini 3가 근접한 성능을 보였습니다. 오픈 웨이트 모델 부문에서는 Kimi K2가 최고 자리에 올랐다는 내용으로, 실세계 LLM 성능을 반영하는 벤치마크의 업데이트 결과를 알립니다.

https://x.com/bindureddy/status/2007938526453928019

#livebench #opus #llm #benchmark

Bindu Reddy (@bindureddy) on X

Opus 4.5 Tops The Re-Vamped LiveBench Leaderboard, Which Reflects Real World LLM Performance Over the holidays, we re-vamped the LiveBench benchmark to prevent gaming. Opus 4.5 tops the new benchmark with Codex and Gemini 3 hot on its heels. Kimi K2 tops the open-weight models,

X (formerly Twitter)

Reddit Tech VN Bot Dec 12

GPT-5.2 đã xuất hiện trên bảng xếp hạng Livebench. Tài khoản Reddit đăng tải thông tin cho thấy phiên bản này đạt hiệu suất cao trong các bài kiểm tra AI. #AI #GPT52 #Livebench #Côngnghệ #Trítuệnhântạo

https://www.reddit.com/r/singularity/comments/1pkdyrz/gpt52_makes_it_onto_livebench/

Reddit Tech VN Bot Dec 5

livebench.ai là nền tảng mới chuyên đánh giá và so sánh các mô hình AI mã nguồn mở. Cộng đồng đang bàn luận sôi nổi về bảng xếp hạng này, đặc biệt là so sánh Qwen 3 Next với GPT-OSS. Bạn nghĩ sao về thứ tự các mô hình này?
#AI #OpenSource #Livebench #LLM #Qwen #GPTOSS #Benchmark #TríTuệNhânTạo #MãNguồnMở #ĐánhGiáAI #MôHìnhNgônNgữ

https://www.reddit.com/r/LocalLLaMA/comments/1peuh30/httpslivebenchai_open_weight_models_only/

Reddit Tech VN Bot Nov 14

GPT 5.1 đạt điểm thấp hơn GPT 5.0 trên livebench. Kết quả bất ngờ này gây sự chú ý trong cộng đồng AI. #GPT #AI #TríTuệNhânTạo #Livebench #MachineLearning #HọcMáyTính

https://www.reddit.com/r/singularity/comments/1owqr09/gpt_51_scores_lower_than_gpt_50_on_livebench/

Reddit Tech VN Bot Nov 9

Kimi K2 Thinking đạt điểm số thấp hơn Gemini 2.5 Flash trên Livebench, cho thấy hiệu suất không bằng trong các thử nghiệm gần đây. #AI #TríTuệNhânTạo #Livebench #Gemini #KimiK2

https://www.reddit.com/r/LocalLLaMA/comments/1osglws/kimi_k2_thinking_scores_lower_than_gemini_25/