DGEMM without FP64 Arithmetic – using FP64 Emulation and FP8 Tensor Cores with Ozaki Scheme

#CUDA #Performance #DGEMM #FP64 #Package

https://hgpu.org/?p=30081

DGEMM without FP64 Arithmetic – using FP64 Emulation and FP8 Tensor Cores with Ozaki Scheme

Since AI computations require low-precision matrix multiplications, processors with enhanced performance for these operations are increasing along with the growing demand for AI computations. Howev…

hgpu.org

Hey friends!

I wrote up a #swift program that calls into #accelerate to measure #DGEMM performance on various apple machines!

Mit licensed, it should provide you with GFLOPS across all {2,3,10}^N matrices that fit on your machine!

Documentation should answer most questions, but please feel free to reach out for anything, including curiosity!

If you do run it, please share your results!
I'm especially interested in M3 Max in various memory/CPU configs as well as M2 Ultra!

#FOSS #HPC #BLAS