Mastodawn

via #AIFoundry : A Developer’s Guide to Managing Models, Cost and Quality in Microsoft Foundry

https://ift.tt/D2CHiaf
#MicrosoftFoundry #Foundry #AIModels #ModelManagement #RAG #RetrievalAugmentedGeneration #AIPlatforms #ModelRouter #CostOptimization #Latency #QualityAssuranc…

A Developer’s Guide to Managing Models, Cost and Quality in Microsoft Foundry | Microsoft Foundry Blog

Learn a practical model lifecycle for Microsoft Foundry: select the right model, evaluate quality, optimize cost, operate safely, and improve as production needs change.

Microsoft Foundry Blog

Brandon H

May 19

via #AIFoundry : How to run evals for the model router

https://ift.tt/XAF1Ivt
#ModelRouter #Foundry #Evals #Evaluations #LLM #AIModelRouting #PromptEngineering #ModelSelection #Latency #Cost #Quality #Benchmarking #OpenSource #GitHub #EvalRepo #Azure #AzureOpenAI #Claude #Fou…

How to run evals for the model router | Microsoft Foundry Blog

Walk through running quality, cost, and latency evaluations for the Foundry model router using an open-source GitHub repo designed for router-aware eval pipelines.

Microsoft Foundry Blog

Miguel Afonso Caetano Feb 11

"Today we are releasing Max, Arena's model router powered by our community’s 5+ million real-world votes. Max acts as an intelligent orchestrator—it routes each user prompt to the most capable model for that specific prompt. Through this, Max achieves top performance across all domains.

In today’s rapidly advancing AI landscape, models and providers are evolving to fill different niches— some models are great at coding, others are strong in math; some answer quickly, while others think longer. Max intelligently leverages the varying strength profiles of different models to produce a unified experience that is reliable across the full usage spectrum.

We recently deployed a Max version in Battle mode, codenamed theta-hat, which achieved #1 on the Arena Overall leaderboard with a score of 1500. This base version of Max is also #1 across all major categories, including Coding, Math, and Expert.

We can also make a latency-aware version of Max, providing top-level performance while keeping response latency low. Our latest latency-aware Max, codenamed arcstride, achieved an Arena score of 1495 while also reducing time-to-first-token latency by more than 16 seconds compared to the next-best model.

Going forward, latency-aware Max will be our default experience in our Direct Chat mode. We plan on continually updating and improving Max over time. Max is now available at arena.ai/max."

https://arena.ai/blog/introducing-max/

#AI #GenerativeAI #LLMs #ModelRouter #Max

Introducing Max

Today we are releasing Max, Arena's model router powered by our community’s 5+ million real-world votes. Max acts as an intelligent orchestrator—it routes each user prompt to the most capable model for that specific prompt.

Arena Blog

Brandon H

Dec 19, 2025

via #AIFoundry : What’s new in Microsoft Foundry | October and November 2025

https://ift.tt/SAcQ9of
#MicrosoftFoundry #AI #MultiAgentSystems #EnterpriseAI #CloudComputing #ModelRouter #BYOModel #NoCode #LowCode #AIIntegration #Microsoft365 #FoundryAgentService #Observability …

What’s new in Microsoft Foundry | October and November 2025 | Microsoft Foundry Blog

Transform AI development with Microsoft Foundry, a unified, interoperable AI platform to build, optimize, and govern AI innovation at scale.

Microsoft Foundry Blog

AI Daily Post Dec 16, 2025

OpenAI just rolled back its ChatGPT model router, giving Instant models more time to run while the promised GPT‑5.2 rollout stalls. The move hints at shifting priorities and a possible rethink of how AI services are balanced against Google’s own advances. What does this mean for developers and users? Dive into the details. #OpenAI #ChatGPT #ModelRouter #InstantModels

🔗 https://aidailypost.com/news/openai-rolls-back-chatgpt-model-router-lets-instant-models-take-longer

EveryDev AI Aug 14, 2025

🛰️ Martian @withmartian

OpenAI LLM router with a willingness_to_pay knob—adjust the base_url to select the best model (cost × quality × latency). Includes open-source adapters for easy multi-model use.

https://www.everydev.ai/tools/martian

#LLM #ModelRouter #AIInfra #DevTools #OpenAIAPI

Martian | EveryDev.ai

Martian provides a model router and gateway that forwards each request to the most suitable LLM based on performance, cost, and reliability. It’s a…