Mastodawn

via #AIFoundry : How to run evals for the model router

https://ift.tt/XAF1Ivt
#ModelRouter #Foundry #Evals #Evaluations #LLM #AIModelRouting #PromptEngineering #ModelSelection #Latency #Cost #Quality #Benchmarking #OpenSource #GitHub #EvalRepo #Azure #AzureOpenAI #Claude #Fou…

How to run evals for the model router | Microsoft Foundry Blog

Walk through running quality, cost, and latency evaluations for the Foundry model router using an open-source GitHub repo designed for router-aware eval pipelines.

Microsoft Foundry Blog

Adrian Segar May 5

Your meeting evaluations may not be reliable because attendees are biased toward easy feedback. Instead, focus on the intangible aspects of the event experience.

https://www.conferencesthatwork.com/index.php/event-design/2013/06/meeting-evaluations-reliable

#evaluations #ROI #events #eventprofs

Adrian Segar May 2

Can conference organizers get evaluative feedback on the long-term outcomes of their events? Try The Reminder and find out!

https://www.conferencesthatwork.com/index.php/event-design/2015/11/the-reminder

#meetings #EventDesign #evaluations #FacilitatingChange #TheReminder #eventprofs

Adrian Segar Apr 10

How can you get great attendee evaluation response rates? Here are three suggestions that work for me. They'll work for you too!

https://www.conferencesthatwork.com/index.php/event-design/2011/01/evaluation-response-rates

#meetings #evaluations #surveys #ResponseRate #eventprofs #events

Adrian Segar Apr 3

It's time for a "sound of silence" roundup of meetings industry pet peeves. Do they resonate? What am I missing?

https://www.conferencesthatwork.com/index.php/meeting-industry/2023/05/sound-of-silence

#meetings #PetPeeves #AirQuality #evaluations #PayingSpeakers #eventprofs #assnchat

Adrian Segar Apr 1

Why you need to make sure event evaluations don't disappear into the hands of the organizers, never to see the light of day again

https://www.conferencesthatwork.com/index.php/event-design/2015/04/do-you-review-event-evaluations-like-a-chinese-censor

#meetings #EventDesign #events #evaluations #surveys #CommunityBuilding #improvement #eventprofs

Adrian Segar Mar 22

Short-term traditional meeting evaluations are unreliable. They tell you nothing about the long-term effects of a session. We can do better.

https://www.conferencesthatwork.com/index.php/event-design/2015/11/why-meeting-evaluations-are-unreliable-and-how-we-can-improve-them

#meetings #EventDesign #evaluations #bias #unreliable #HowToImprove #eventprofs

sayzard Mar 21

Aman Sanger (@amanrsanger)

Kimi k2.5를 여러 베이스 모델과 perplexity 기반 평가로 비교한 결과, 가장 강력한 모델로 평가했다고 언급했습니다. 이어서 continued pre-training과 고비용 RL을 4배 규모로 확장해 성능을 끌어올렸다고 밝혀, 최신 모델 평가와 학습 전략 측면에서 중요한 내용입니다.

https://x.com/amanrsanger/status/2035079293257359663

#kimi #llm #reinforcementlearning #pretraining #evaluations

Aman Sanger (@amanrsanger) on X

We've evaluated a lot of base models on perplexity-based evals and Kimi k2.5 proved to be the strongest! After that, we do continued pre-training and high-compute RL (a 4x scale-up). The combination of the strong base, CPT and RL, and Fireworks' inference and RL samplers make

X (formerly Twitter)

Adrian Segar Mar 20

In meetings, as in education, we need to connect the dots, not collect the dots.

https://www.conferencesthatwork.com/index.php/event-design/2023/08/connect-the-dots

#meetings #EventDesign #evaluations #FacilitatingChange #SethGodin #eventprofs