Mastodawn

https://www.youtube.com/watch?v=32cjdHVoSRo&t=1s

builds a fully open source stack with fedora after some difficulty #cluster #rocm #rdma #rhel gateway #operator error #userland

Three months wrong about why my 4-node AMD cluster was slow

YouTube

sayzard 2d ago

Alex Cheema (@alexocheema)

M3 Ultra Mac Studio 대신 M5 Max MacBook 4대로 로컬 AI 클러스터를 구성하는 방식을 제안한다. 동일한 비용으로 더 높은 메모리 대역폭과 FLOPS, all-to-all RDMA를 얻을 수 있다고 주장하며, 대형 로컬 AI 배포의 새로운 대안으로 소개한다.

https://x.com/alexocheema/status/2055018796759277922

#localai #macbook #apple #rdma #aiinfrastructure

Alex Cheema (@alexocheema) on X

If you’re considering dropping 20k on a 512GB M3 Ultra Mac Studio o eBay, consider 4 x M5 Max MacBooks with 6 TB5 cables instead. Same price with 3x memory bandwidth, 9x FLOPS with all-to-all RDMA.

X (formerly Twitter)

sayzard 3d ago

Alex Cheema (@alexocheema)

M5 Max MacBook 4대를 RDMA로 클러스터링해 512GB 메모리, 2456GB/s 대역폭, 2만 달러, 560W 수준의 조용한 구성으로 사용한 사례를 공유했다. 오늘 바로 구매 가능한 고성능 로컬 AI/컴퓨팅 클러스터 대안으로 주목할 만하다.

https://x.com/alexocheema/status/2054598636588122184

#macbook #rdma #cluster #hardware #aiinfrastructure

Alex Cheema (@alexocheema) on X

4 x M5 Max MacBooks clustered with RDMA: 512GB @ 2456GB/s, $20k, 560W, quiet. Find me a better deal, that I can buy today.

X (formerly Twitter)

sayzard May 8

AlexK (@AlexKi1993)

MiniMax M2.7 또는 Deepseek V4 Flash를 대상으로 클러스터 구성을 Codex나 Claude Code로 자동화하고, RDMA가 제대로 동작하는지 확인하라는 실전 팁을 공유한다. 성능이 벤치마크보다 낮으면 RDMA 연결이나 Docker/vLLM/SGLang 설정 문제가 원인일 가능성이 높다고 강조한다.

https://x.com/AlexKi1993/status/2052603594541592906

#minimax #deepseek #rdma #vllm #sglang

AlexK (@AlexKi1993) on X

@SamJWasserman @NVIDIAAIDev @NVIDIAAI @ComfyUI @LTXStudio @Alibaba_Qwen MiniMax M2.7 or Deepseek V4 Flash. Tipps for the start: have the cluster setup done by codex / Claude Code, make sure RDMA is working and the docker with vllm / Sglang can access is. If performance is blow benchmarks it's most likely bad configuration with RDMA connection

X (formerly Twitter)

ServeTheHome May 6

NVIDIA is talking about is Spectrum-X MRC, a custom RDMA transport protocol already powering frontier gigascale AI deployments#NVIDIA #RDMA #RoCEv2 #Spectrum-X
NVIDIA Spectrum-X Ethernet MRC is the Custom RDMA Transport Protocol for Gigascale AI

NVIDIA Spectrum-X Ethernet MRC is the Custom RDMA Transport Protocol for Gigascale AI

NVIDIA is talking about is Spectrum-X MRC, a custom RDMA transport protocol already powering frontier gigascale AI deployments

ServeTheHome

ServeTheHome May 6

NVIDIA is talking about is Spectrum-X MRC, a custom RDMA transport protocol already powering frontier gigascale AI deployments#NVIDIA #RDMA #RoCEv2 #Spectrum-X
NVIDIA Spectrum-X MRC is the Custom RDMA Transport Protocol for Gigascale AI

NVIDIA Spectrum-X MRC is the Custom RDMA Transport Protocol for Gigascale AI

NVIDIA is talking about is Spectrum-X MRC, a custom RDMA transport protocol already powering frontier gigascale AI deployments

ServeTheHome

sayzard Apr 24

EXO Labs (@exolabs)

EXO v1.0.71이 출시됐다. 이번 패치는 샘플링 기본값 개선, M5 시리즈 맥과 RDMA 관련 버그 수정, 그리고 @Kimi_Moonshot의 K2.6 지원 추가가 핵심이다. macOS/분산 환경에서 사용하는 개발자에게 실질적인 개선을 주는 소규모 업데이트다.

https://x.com/exolabs/status/2047332301218951352

#exo #release #macos #rdma #kimi

EXO Labs (@exolabs) on X

EXO v1.0.71 is out. This is a small patch release, with better defaults for sampling and bug fixes for M5 series Macs and RDMA, along with support for @Kimi_Moonshot K2.6.

X (formerly Twitter)

sayzard Apr 22

Cheng (@zcbenz)

MLX가 macOS에서 Thunderbolt 기반 RDMA(Remote Direct Memory Access) 구현을 독립 라이브러리로 공개했다. 이 라이브러리는 로컬 AI용 Mac 클러스터를 구동하는 핵심 기술이며, TCP 기반 프로토콜보다 약 10배 빠르다고 소개된다.

https://x.com/zcbenz/status/2046775757524082957

#mlx #rdma #macos #thunderbolt #localai

Cheng (@zcbenz) on X

MLX's implementation of RDMA (Remote Direct Memory Access) over Thunderbolt on macOS, can now be used as an independent library by anyone: https://t.co/pjCuVM8vkH It is the gem that powers Mac clusters for local AI, and is an order of magnitude faster than protocols over TCP.

X (formerly Twitter)

sayzard Apr 21

Alex Cheema (@alexocheema)

Qwen3.6 35B 비전 모델을 2대의 M5 Max MacBook Pro에서 Thunderbolt 5 기반 RDMA로 구동한 사례다. 애플파크를 정확히 인식했고, John Ternus를 Jeff Williams로 잘못 식별했지만, prefix caching 덕분에 응답이 거의 즉시 나와 로컬 멀티디바이스 추론 성능을 보여준다.

https://x.com/alexocheema/status/2046396845270700350

#qwen #visionmodel #macbookpro #rdma #prefixcaching

Alex Cheema (@alexocheema) on X

Running Qwen3.6 35B (vision) on 2 x M5 Max MacBook Pro with RDMA over Thunderbolt 5. It describes the image and identifies Apple Park correctly, but misidentifies John Ternus as Jeff Williams. Near instant response with prefix caching.

X (formerly Twitter)

sayzard Apr 12

EXO Labs (@exolabs)

exo가 MiniMax M2.7을 day-0부터 지원한다고 발표했습니다. RDMA와 tensor parallelism을 지원해 Mac 클러스터에서 거의 선형 확장 성능을 기대할 수 있으며, 여러 M4/M5 Mac 장비 조합에서 구동 가능하다고 소개했습니다.

https://x.com/exolabs/status/2043200735265915123

#exo #minimax #rdma #tensorparallelism #mac

EXO Labs (@exolabs) on X

Excited to share that we have day-0 support for MiniMax M2.7 in exo. Supports RDMA / tensor parallelism for ~linear scaling with mac clusters. Some setups you can run it on: - 4 x 64GB M4 Pro Mac Mini - 2 x 128GB M5 Max MacBook Pro - 2 x 128GB M4 Max Mac Studio

X (formerly Twitter)