I need to get better about announcing improvements to the OpenCL Intercept Layer, so: I merged two new features this week that I think are pretty neat. As a reminder, the OpenCL Intercept Layer is an open source tool for debugging and profiling OpenCL applications. It works with most OpenCL implementations and requires no application modifications. #OpenCL

https://github.com/intel/opencl-intercept-layer

GitHub - intel/opencl-intercept-layer: Intercept Layer for Debugging and Analyzing OpenCL Applications

Intercept Layer for Debugging and Analyzing OpenCL Applications - intel/opencl-intercept-layer

GitHub
The first adds conditional profiling. This is useful to restrict profiling to the specific regions of an application that you care about, while minimizing overhead and ignoring profiling data for unimportant regions. The conditional profiling is controlled by an environment variable so it can easily be used by many programming languages.
The second adds device timing histograms. This is useful to understand how a workload executes at a high level, across all kernels. It is especially useful to identify outliers if the execution time for some kernels varies based on kernel inputs or other factors.