@fasterthanlime @dysfun Polar Signals has done a bunch of work to make stack traces fast (kinda a necessity for an enable-in-production profiling tool).
If I remember correctly, they implement¹ just enough DWARF interpreter to traverse the call stack - at the cost of sometimes failing to produce a stack trace. backtrace-rs could probably do something similar? It doesn't need to unwind the stack - it doesn't need to find all the locals and run their destructors - it just needs to find the return address. Once you've saved all the return address you can defer symbol lookup to later (possibly even on another machine, or requiring hitting the network for debuginfod symbols).
¹: in eBPF, obviously, because they're running from kernel context
@RAOF @fasterthanlime @dysfun yes you only have to do some of the work when doing backtraces instead of proper unwinding
there's a lot of tradeoffs here in terms of correctness/speed/detail. if i wanted to dig into optimizing "oops all backtraces" in rust i would be looking into what samply does, which is specifically designed to cache and optimize the relevant data, and is all rust code
@Gankra @RAOF @fasterthanlime @dysfun The unwinding part of samply is done by framehop: https://github.com/mstange/framehop
The tricky part about using it from backtrace-rs would probably be the detection of when libraries are loaded into / unloaded from the process. Or in other words, the tricky part about caching unwind rules is knowing when to invalidate the cache. I don't know how libunwind does that part.