DGEMM without FP64 Arithmetic – using FP64 Emulation and FP8 Tensor Cores with Ozaki Scheme
DGEMM without FP64 Arithmetic – using FP64 Emulation and FP8 Tensor Cores with Ozaki Scheme
Hey friends!
I wrote up a #swift program that calls into #accelerate to measure #DGEMM performance on various apple machines!
Mit licensed, it should provide you with GFLOPS across all {2,3,10}^N matrices that fit on your machine!
Documentation should answer most questions, but please feel free to reach out for anything, including curiosity!
If you do run it, please share your results!
I'm especially interested in M3 Max in various memory/CPU configs as well as M2 Ultra!