NVIDIA just poured $150 M into Baseten, accelerating Jensen Huang’s shift to an inference‑first strategy. The funding will boost GPU‑powered AI inference pipelines, making production‑grade models easier for enterprises to deploy. Curious how this changes the ML landscape? Read on. #AIInference #NVIDIA #Baseten #ProductionAI

🔗 https://aidailypost.com/news/nvidia-puts-usd-150-m-into-baseten-backing-jensen-huangs

Baseten is challenging the big cloud players with an open‑source‑friendly AI platform that lets you keep full ownership of your model weights. From training to inference, it even supports speculative decoding—offering a cheaper, faster alternative to the usual OpenAI‑style services. Curious how this could reshape the AI stack? Read on! #Baseten #Hyperscalers #ModelWeights #SpeculativeDecoding

🔗 https://aidailypost.com/news/baseten-takes-hyperscalers-ai-platform-that-lets-users-own-model

🌘 如何在 NVIDIA GPU 上以每秒 500+ 個 token 的速度運行 GPT OSS 120B
➤ 在上市首日,Baseten 如何透過實驗、除錯和基準測試,實現 GPT OSS 120B 的卓越效能。
https://www.baseten.co/blog/sota-performance-for-gpt-oss-120b-on-nvidia-gpus/
本文詳細闡述了 Baseten 團隊如何在 NVIDIA GPU 上,針對 GPT OSS 120B 模型達成每秒超過 500 個 token 的尖端延遲和吞吐量。作者分享了從首次推理、修復相容性錯誤到優化模型配置的完整過程,強調了使用 TensorRT-LLM、TPU 協同運作、KV 快取路由和預測解碼等技術,並展示瞭如何在上市首日就為客戶提供最佳的性能體驗。
+ 看到 Baseten 團隊能在短時間內達成如此驚人的效能,真是令人印象深刻。特別是他們在 TensorRT-LLM 和 GPU 協同運作方面的深入研究,為業界樹立了標
#AI 模型效能 #GPU 優化 #大型語言模型 #OpenAI #Baseten
How we run GPT OSS 120B at 500+ tokens per second on NVIDIA GPUs | Baseten Blog

How we optimized GPT OSS 120B for state-of-the-art latency and throughput on launch day.

Baseten

Lately, I cook more often than I deal with money (or all things decimal for relevance here). Which means that when I see numbers like 20 or 19 my mind thinks 40 or 41 as their supplement to fill an hour or minute and vice versa. Base 60.

For these numbers less than sixty, I think first about them compared to sixty before I think about them compared to one hundred or ten.

#SelfObservation #BaseTen #BaseSixty #ClockMath