How to run evals for the model router | Microsoft Foundry Blog
Walk through running quality, cost, and latency evaluations for the Foundry model router using an open-source GitHub repo designed for router-aware eval pipelines.
Microsoft Foundry BlogYour meeting evaluations may not be reliable because attendees are biased toward easy feedback. Instead, focus on the intangible aspects of the event experience.
https://www.conferencesthatwork.com/index.php/event-design/2013/06/meeting-evaluations-reliable
#evaluations #ROI #events #eventprofs
Aman Sanger (@amanrsanger)
Kimi k2.5를 여러 베이스 모델과 perplexity 기반 평가로 비교한 결과, 가장 강력한 모델로 평가했다고 언급했습니다. 이어서 continued pre-training과 고비용 RL을 4배 규모로 확장해 성능을 끌어올렸다고 밝혀, 최신 모델 평가와 학습 전략 측면에서 중요한 내용입니다.
https://x.com/amanrsanger/status/2035079293257359663
#kimi #llm #reinforcementlearning #pretraining #evaluations

Aman Sanger (@amanrsanger) on X
We've evaluated a lot of base models on perplexity-based evals and Kimi k2.5 proved to be the strongest!
After that, we do continued pre-training and high-compute RL (a 4x scale-up).
The combination of the strong base, CPT and RL, and Fireworks' inference and RL samplers make
X (formerly Twitter)
Foundry Agent Service is GA: private networking, Voice Live, and enterprise-grade evaluations | Microsoft Foundry Blog
Explore the Microsoft Foundry Agent Service for seamless production AI deployment with enhanced networking and compliance features.
Microsoft Foundry Blog