Good News: Today I had the opportunity to talk with #amd staff: 1. Yes, the transformer support was a higher priority than convolutions. 2. My problem can be tracked down to the backwards pass of Conv3D 3. By using the MIOPEN_FIND_MODE=3 and MIOPEN_FIND_ENFORCE=3 env variables in #rocm 6.4 I got a huge performance boost such that my code now runs faster on the #mi300a than on the #A100 🤩

🚀 Ready to test the limits of performance?

Join the @EPCC Hackathon on AMD GPUs and explore the cutting-edge #MI300A and AMD’s Next Generation #Fortran Compiler with #OpenMP offload!

💻 Bring your code, ideas, and curiosity.
🔧 Optimize, accelerate, and innovate with us.
🏆 Let’s see what you can build!

🔗 https://www.archer2.ac.uk/training/courses/250527-amd-hackathon/

#AMDGPU #HPC #GPUComputing #Hackathon #OpenScience

AMD MI300 Series Hackathon

Today I tried out #AMD #Instinct #MI300a for my existing Deep Learning pipeline. Good news: It worked out of the box. Bad news: For some reason it could not beat my local #Nvidia #1080ti...
After trying all sorts of #ROCM installation methods via prebuild wheels, #apptainer images etc I tried #nanogpt by @karpathy and sure enought: The gpt code ran approx 2x faster than on a #a100 ... I hope that this is due to my programming skills. Not AMD prefering #transformers over #CNNs ...
Sizing up #MI300A’s #GPU
It’s well ahead of #Nvidia’s #H100 PCIe for just about every major category of 32- or 64-bit operations. MI300A can achieve 113.2 TFLOPS of #FP32 throughput, with each FMA counting as two floating point operations. For comparison, H100 PCIe achieved 49.3 TFLOPS in same test.
#AMD cut down #MI300X’s GPU to create MI300A. 24 #Zen4 cores is a lot of #CPU power, and occupies one quadrant on the MI300 chip. But MI300’s main attraction is still the GPU.
https://chipsandcheese.com/p/sizing-up-mi300as-gpu
Sizing up MI300A’s GPU

AMD’s Instinct MI300A is a giant APU, created by swapping out two GPU chiplets (XCDs) for three CPU chiplets (CCDs).

Chips and Cheese
#Germany unleashes #AMD-powered Hunter #supercomputer
€15 million system to serve as testbed for larger Herder supercomputer coming in 2027
Built by #HewlettPackardEnterprise (#HPE), Hunter is based on a #Cray EX4000 platform and powered by a combination of AMD Instinct #MI300A accelerated processing units (#APU) and #Epyc Genoa #CPU.
https://www.theregister.com/2025/01/17/hlrs_supercomputer_hunter/
Germany unleashes AMD-powered Hunter supercomputer

€15 million system to serve as testbed for larger Herder supercomputer coming in 2027

The Register
#HPE #Cray #EX4000 'Hunter' #Supercomputer Now in Operation at #HLRS Powered by #AMD APUs
With a theoretical peak performance of 48.1 petaflops (48.1 quadrillion floating point operations), Hunter’s speed is nearly double that of HLRS’s previous flagship supercomputer, called Hawk. Hunter is based on the AMD Instinct #MI300A accelerated processing unit (#APU), which combines CPUs, GPU accelerators, and high bandwidth memory in a single package.
https://insidehpc.com/2025/01/hpe-hunter-supercomputer-now-in-operation-at-hlrs-powered-by-amd-apus/
HPE 'Hunter' Supercomputer Now in Operation at HLRS Powered by AMD APUs - High-Performance Computing News Analysis | insideHPC

On the heels of a Bloomberg report that HPE has won a $1 billion deal with Elon Musk's X (Twitter) social network for AI-optimized servers, a major European supercomputing center announced today the start of operations of an HPE-Cray supercomputer powered by AMD processors .....

High-Performance Computing News Analysis | insideHPC
HLRS Celebrates Inauguration of "Hunter" #Supercomputer o Hunter is based on the @AMD Instinct #MI300A APUs o ~doubles the performance of its predecessor, Hawk, while using 80% less energy o HLRS’s next supercomputer, Herder, is planned for 2027 www.hlrs.de/news/detail/... #HPC #AI
Bluesky

Bluesky Social
Built by HPE and featuring AMD’s cutting-edge #MI300A APUs, #ElCapitan supports the NNSA’s mission of ensuring the safety, security and reliability of the U.S. nuclear stockpile www.youtube.com/watch?v=Q8wt... #HPC #AI #Exascale

El Capitan: The World’s Fastes...
Bluesky

Bluesky Social
#ElCapitan Towers Above the #Top500 in a Big #HPE Win using #AMD Instinct #MI300A system. With sustained #HPL of 1.742EF and peak speed of 2.79EF of #FP64, this is a big jump over previous generation systems. El Capitan has a unique architecture as it uses #APU combining #CPU plus #GPU onto a package with high-bandwidth memory. Over 44,000 of these MI300A APUs are then packed into the HPE #Cray Shasta liquid cooled platform, and connected via the Slingshot interconnect.
https://www.servethehome.com/el-capitan-towers-above-the-top500-in-a-big-hpe-and-amd-win/
El Capitan Towers Above the Top500 in a Big HPE and AMD Win

El Capitan takes the top spot in the November 2024 Top500 list ending Frontier's time as #1. El Capitan uses the AMD MI300A APU

ServeTheHome

All the tech and history that culminates in the MI300A accelerator!

#AMD’s Long And Winding Road To The Hybrid #CPU-#GPU Instinct #MI300A
https://www.nextplatform.com/2024/07/17/amds-long-and-winding-road-to-the-hybrid-cpu-gpu-instinct-mi300a/

AMD’s Long And Winding Road To The Hybrid CPU-GPU Instinct MI300A

Back in 2012, when AMD was in the process of backing out of the datacenter CPU business and did not really have its datacenter GPU act together at all,

The Next Platform