TIL: Even though #Cublas always assumes column-major order, the docs of #cudaMemcpy2D assume row-major order!

Evaluation of computational and energy performance in matrix multiplication algorithms on CPU and GPU using MKL, cuBLAS and SYCL

#CUDA #SYCL #MKL #CUBLAS #MatrixMultiplication #LinearAlgebra #Performance #Package

https://hgpu.org/?p=29229

Evaluation of computational and energy performance in matrix multiplication algorithms on CPU and GPU using MKL, cuBLAS and SYCL

Matrix multiplication is fundamental in the backpropagation algorithm used to train deep neural network models. Libraries like Intel’s MKL or NVIDIA’s cuBLAS implemented new and optimiz…

hgpu.org
Not sure who needs to know that, but if you get a #CUBLAS error 15 with #llama.cpp and the .cu-file has something about f16 at about the line which fails, starting main with --memory-f32 may be a workaround. Had this with the #NVIDIA #Tesla #M40 24GB.
#AI #MachineLearning #CUDA #llama2 #Meta