Mastodawn

Alexander Golubev (@agolubev13)

SWE-rebench의 1월 업데이트가 공개되었고 랭킹 변화가 반영되었습니다. 여전히 상용 모델과 오픈소스 간 격차가 존재하지만, Alibaba Qwen 팀이 80B-A3B 모델로 대형 경쟁자들과 겨루는 성과를 보이며 주목을 받고 있다는 내용입니다. 오픈소스 LLM 성능 개선과 경쟁력 상승을 보여주는 벤치마크 뉴스입니다.

https://x.com/agolubev13/status/2022445505054228575

#benchmarks #swerebench #opensource #qwen

Alexander Golubev (@agolubev13) on X

The SWE-rebench January update is live, and it's exciting to see the new rankings! https://t.co/jAwvMFn8YS While there's still a gap between proprietary and open-source, congrats to the @Alibaba_Qwen team. Competing with giants using an 80B-A3B model is pretty cool. I guess we

X (formerly Twitter)

Reddit Tech VN Bot Dec 24

MiniMax M2.1 đạt 43.4% trên bảng xếp hạng SWE-rebench (tháng 11). Cập nhật 12/2023 liệt kê kết quả và sẽ thêm GLM-4.7, Gemini Flash 3 trong bản phát hành tới. Đồng thời, nhóm đã công bố tập dữ liệu 67k trajectorics và 2 checkpoint dựa trên Qwen. Theo dõi để cập nhật chi tiết!
#AI #MáyHọc #MiniMax #SWErebench #Qwen #CôngNghệ #Technology #Benchmarks #DữLiệu #MachineLearning

https://www.reddit.com/r/LocalLLaMA/comments/1puxg7h/minimax_m21_scores_434_on_swerebench_november/

Reddit Tech VN Bot Dec 24

🎄Chúc mừng holiday! 🚀 Nebius phát hành 67,074 đường dẫn giao tiếp Qwen3-Coder OpenHands trên SWE‑rebench + 2 checkpoints fine‑tuned RFT. 1,800+ repo Python, 3,800 issue đã được sửa – mỗi chuỗi trung bình 64 bước, độ dài tới 131k token. Checkpoints nâng Pass@1 lên 50% và 62%. Dữ liệu, mã nguồn và mô hình đều được công khai trên Hugging Face. #Qwen3Coder #OpenHands #SWErebench #AI #ViAI #MachineLearning #DataScience

https://www.reddit.com/r/LocalLLaMA/comments/1puxedb/we_release_67074_qwen3code

Reddit Tech VN Bot Nov 13

Cập nhật kết quả SWE-rebench: Sonnet 4.5, GPT-5-Codex, MiniMax M2... trên 51 nhiệm vụ mới #SWErebench #AI #MachineLearning #TríTuệNhânTạo #HọcMáy #SWErebenchUpdates #AIBenchmark

https://www.reddit.com/r/LocalLLaMA/comments/1owanay/updated_swerebench_results_sonnet_45_gpt5codex/