Any ideas why perf_event_open() would often return 0 CPU instructions retired when I pin a thread to a particular core?

I guess sometimes the kernel can't put enough counters in the limited number provided by the hardware, and then it either samples (giving reduced data) or just... returns 0?

And kinda seems like whatever is using the counters is some other process, since I still have intermittent issues when I use a lock to limit this to one counter at a time in my code.

And you can actually see the time enabled vs time running and yeah this is a problem.

@itamarst IIRC, since Sandy Bridge (i.e. a long time ago) you can have at least 4 hardware performance counters active at a time per core. I know it's 6 on newer Zens. But yeah, if you try to enable more counters than that, the software driver (perf in this case) has to time-multiplex the hardware counters and rely on statistical sampling to estimate the count. Not sure if that's the actual reason for what you're seeing, though.

@pervognsen @itamarst Intel had eight per physical core, halved to four per logical when booting with HT enabled.

AMD always has six per logical core, in contrast.