Perplexity (@perplexity_ai)

GB200과 H200의 성능 비교 결과를 제시했다. NVLS all-reduce 지연시간과 MoE prefill/ combine 시간이 크게 줄었고, 디코드 단계에서도 고토큰 속도에서 더 높은 처리량을 유지해 대규모 모델 추론 성능 개선이 확인됐다.

https://x.com/perplexity_ai/status/2054204425833726353

#gb200 #h200 #benchmark #moe #inference

Perplexity (@perplexity_ai) on X

The benchmarks show the gap. NVLS all-reduce latency drops from 586.1µs on H200 to 313.3µs on GB200. In MoE prefill at EP=4, combine falls from 730.1µs to 438.5µs. For decode, GB200 sustains much higher throughput at high token speeds.

X (formerly Twitter)

Perplexity (@perplexity_ai)

Qwen3 235B 후학습 모델을 NVIDIA GB200 NVL72 Blackwell 랙에서 서빙하는 방법에 대한 새로운 연구를 공개했다. GB200은 대규모 MoE 모델의 고처리량 추론에서 Hopper 대비 큰 성능 향상을 보이며, 학습뿐 아니라 추론 플랫폼으로서도 중요성이 강조됐다.

https://x.com/perplexity_ai/status/2054204402144350450

#qwen3 #nvidia #gb200 #blackwell #inference

Perplexity (@perplexity_ai) on X

We published new research on how we serve post-trained Qwen3 235B models on NVIDIA GB200 NVL72 Blackwell racks. GB200 is a major step up over Hopper for high-throughput inference on large MoE models, not just a training platform.

X (formerly Twitter)
Price of #Nvidia's #VeraRubin #NVL72 racks skyrockets to $8.8M apiece, but #server makers' margins will be tight — Nvidia is moving closer to shipping entire full-scale systems
#Blackwell #NVL72 #rackscale system costs $2.8 – $3.4M for an #AI training and #HPC NVL72 #GB200 and $6M to $6.5M for an AI inference NVL72 #GB300
Vera Rubin NVL72 #VR200 systems are currently quoted at $5M - $7M per unit.
Nvidia has never confirmed the list prices of its NVL72 or #NVL144 products.
https://www.tomshardware.com/tech-industry/artificial-intelligence/price-of-nvidias-vera-rubin-nvl72-racks-skyrockets-to-as-much-as-usd8-8-million-apiece-but-server-makers-margins-will-be-tight-nvidia-is-moving-closer-to-shipping-entire-full-scale-systems
Price of Nvidia's Vera Rubin NVL72 racks skyrockets to as much as $8.8 million apiece, but server makers' margins will be tight — Nvidia is moving closer to shipping entire full-scale systems

Nvidia and other chipmakers will still make plenty of cash.

Tom's Hardware

Discover how NVIDIA's Blackwell NVL72 powers top AI models like Kimi K2 Thinking 10x faster #MixtureOfExperts #NVIDIA #AI

The top 10 most intelligent open-source models, including Kimi K2 Thinking and DeepSeek-R1, utilize a mixture-of-experts (MoE) architecture, which mimics the human brain's efficiency. These models achieve a 10x speed increase when run on NVIDIA's GB200 NVL72, specifically the Blackwell NVL72. The MoE architecture is a key...

#MixtureOfExperts #NVIDIA #GB200 #NVL72

"#Huawei #AI CloudMatrix 384 – #China’s Answer to #Nvidia #GB200 NVL72 China Abundance of Power, 100% Optics, 0% Copper, Power Inefficiency, 2.6x lower FLOP per Watt, 14 Transceivers per Chip, Linear Pluggable Optics

300 PFLOPs of dense BF16 compute, almost double that of the GB200 NVL72. ... 3.6x aggregate memory capacity and 2.1x more memory bandwidth, Huawei and China ... can beat Nvidia’s."

https://semianalysis.com/2025/04/16/huawei-ai-cloudmatrix-384-chinas-answer-to-nvidia-gb200-nvl72/

The technology boycott works...

Huawei AI CloudMatrix 384 – China’s Answer to Nvidia GB200 NVL72

Huawei is making waves with its new AI accelerator and rack scale architecture. Meet China’s newest and most powerful Chinese domestic solution, the CloudMatrix 384 built using the Ascend 910C. Thi…

SemiAnalysis
#NewYork Invests $40M to Launch #EmpireAI Beta with #NVIDIA Blackwell #Supercomputer
#EmpireAIBeta will be 11x more powerful than current capacity, allowing hundreds of researchers from 10 institutions to continue to advance #AI research. Empire AI is now backed by over $500M in public and private funding, including up to $340M in state funds.
Empire AI Beta also is expected to be among the first academic deployments of NVIDIA DGX SuperPOD with DGX #GB200 systems.
https://www.hpcwire.com/off-the-wire/state-of-new-york-invests-40m-to-launch-empire-ai-beta-with-nvidia-blackwell-supercomputing/ #SUNY
#Nvidia's Arm chips rapidly gain share in #server market as #AI booms — Nvidia's Arm-powered #GB200 servers surge as market reaches a record $95 billion in the first quarter
It appears that the vast majority of these Arm-powered machines are Nvidia's #GB200 #NVL72 #rackscale solution, based on the Grace Blackwell platform, which features an Nvidia Grace CPU and eight B200 AI GPUs per server.
https://www.tomshardware.com/desktops/servers/nvidias-arm-chips-rapidly-gain-share-in-server-market-as-ai-booms-nvidias-arm-powered-gb200-servers-surge-as-market-reaches-a-record-usd95-billion-in-the-first-quarter
Nvidia's Arm chips rapidly gain share in server market as AI booms — Nvidia's Arm-powered GB200 servers surge as market reaches a record $95 billion in the first quarter

Server demand is expected to continue for the next four years.

Tom's Hardware

From the other place:

#OpenAI starts using first few #GB200 racks in #Azure

#AI #GPU