LLM Stats (@LlmStats)

Step-3.5-Flash(또는 StepFun의 모델)가 LiveCodeBench V6에서 0.864로 1위를 기록하며 Kimi K2.5(0.85), GLM-4.7(0.849), GPT OSS 120B(0.819) 등을 제치고 최상위 성능을 보였습니다. LiveCodeBench V6는 실제 경쟁 프로그래밍 플랫폼의 최신 문제로 모델을 평가하는 벤치마크입니다.

https://x.com/LlmStats/status/2022377816302510189

#livecodebench #codeeval #llm #modelbenchmark

LLM Stats (@LlmStats) on X

Step-3.5-Flash (@StepFun_ai) tops LiveCodeBench V6 with 0.864 #1 out of all models, ahead of Kimi K2.5 (0.85), GLM-4.7 (0.849), and GPT OSS 120B (0.819). LiveCodeBench V6 tests models on fresh, real-world coding problems from competitive programming platforms. Step-3.5-Flash

X (formerly Twitter)

IQuestLab đã phát hành IQuest-Coder-V1, một mô hình ngôn ngữ lớn (LLM) với 40 tỷ tham số chuyên về lập trình. IQuest-Coder-V1 đã đạt được kết quả hàng đầu trên các bài kiểm trabenchmarks như SWE-Bench Verified (81.4%), BigCodeBench (49.9%), và LiveCodeBench v6 (81.1%).

#IQuestLab #IQuestCoderV1 #LLM #CodingWithAI #SWE #BigCodeBench #LiveCodeBench

https://www.reddit.com/r/LocalLLaMA/comments/1q0vom4/iquestlabiquestcoderv1_40b_parameter_coding_llm/

Are you someone who works with code? Do you want to tell the hype from reality in #LLM coding assistants? Apple created a new coding benchmark #livecodebench with help from human Olympiad medalists, preventing contamination with continuously updated problems. There is a website where you can track the latest stats: https://livecodebenchpro.com/. And the research methodology paper is on Arxiv: https://arxiv.org/pdf/2506.11928. Top findings: 🧵