Mastodawn

UC San Diego’s Hao AI Labs are pushing real‑time LLM interaction with NVIDIA’s DGX B200. Their new DistServe system splits inference across nodes, slashing latency for large language models. Curious how disaggregated inference reshapes AI? Dive in. #NVIDIADGX #LowLatencyLLM #DistServe #RealTimeAI