An Almost Pointless Exercise in GPU Optimization | Speechmatics

Experience converting a multi-threaded C++ application to run faster on GPU. How to interpret NSight Compute recommendations to improve an algorithm on GPU.