| Website | https://ivan.computer |
| GitHub | https://github.com/bobrik |
| Website | https://ivan.computer |
| GitHub | https://github.com/bobrik |
Looking at devmapper ioctl failures shows limited utility of just 16 frames of LBR. You mostly capture cleanup if you attach to high level functions.
Having bpf accounting enabled doesn't help either.
Sneak peek: LBR based stacks in tracing spans for failed syscalls through ebpf_exporter.
Think "retsnoop, but more structured". It be attached to existing traces too.
Also pictured: SRSO sadness.
If you have a lot of ebpf programs and you have any sort of monitoring that looks at /proc/kallsyms regularly, you might want to have my patch applied: https://lore.kernel.org/bpf/20260129-i[email protected]/T/#u
Upstream kernel does a silly thing with quadratic pointer chasing under RCU, which is not great.
Testing the patch in production on v6.12 series (with ~10k bpf ksyms):
* On AMD EPYC 9684X (Zen4): ~870ms -> ~100ms
* On Ampere Altra Max M128-30: ~4650ms -> ~70ms
... and Rust symbols are in fact shorter demangled if you drop the hashes.
I'm seeing 2.65x longer symbols for ClickHouse and ~0.7-0.9x shorter Rust symbols for our production code.
627k for a ptr::drop_in_place 🫣
It is 753k mangled! Yes, it is longer mangled.
Take that C++, which tops at measly 167k.
Apparently even that is not the limit, because llvm-cxxfilt choked on the biggest ones.
Here's a symbol of 64,039 bytes! Well, a part of it, because it does not fit in the screenshot. It's a part of a single stack trace that is 184,838 bytes long.
Perhaps it's a good idea to do aggregation first and demangling second to deal with this nonsense.