Analyzing the Impact of Kernel Fusion on GPU Tensor Operation Performance: A Systematic Performance Study

Analyzing the Impact of Kernel Fusion on GPU Tensor Operation Performance: A Systematic Performance Study
Large numbers of small tensor kernels are executed by GPUs in modern deep learning frameworks, where total performance is frequently constrained by memory bandwidth and kernel launch overheads. Sys…