Any ideas why perf_event_open() would often return 0 CPU instructions retired when I pin a thread to a particular core?
I guess sometimes the kernel can't put enough counters in the limited number provided by the hardware, and then it either samples (giving reduced data) or just... returns 0?
And kinda seems like whatever is using the counters is some other process, since I still have intermittent issues when I use a lock to limit this to one counter at a time in my code.
And you can actually see the time enabled vs time running and yeah this is a problem.