Hi Mastodon folks, what do you use for performance profiling/analysis at the instruction level? VTune is painful and crashes. Do you use llvm-mca? I am tired of guessing my way through optimizations.
Please repost, I would really love to hear some opinions :)