) I remember reading about major Linux restrictions switching to enabling them by default. I'm surprised that it's not the default in Rust yet!!@fasterthanlime @dysfun Polar Signals has done a bunch of work to make stack traces fast (kinda a necessity for an enable-in-production profiling tool).
If I remember correctly, they implement¹ just enough DWARF interpreter to traverse the call stack - at the cost of sometimes failing to produce a stack trace. backtrace-rs could probably do something similar? It doesn't need to unwind the stack - it doesn't need to find all the locals and run their destructors - it just needs to find the return address. Once you've saved all the return address you can defer symbol lookup to later (possibly even on another machine, or requiring hitting the network for debuginfod symbols).
¹: in eBPF, obviously, because they're running from kernel context
@RAOF @fasterthanlime @dysfun yes you only have to do some of the work when doing backtraces instead of proper unwinding
there's a lot of tradeoffs here in terms of correctness/speed/detail. if i wanted to dig into optimizing "oops all backtraces" in rust i would be looking into what samply does, which is specifically designed to cache and optimize the relevant data, and is all rust code
you can also "avoid" the work of backtracing by just saving the entire stack (usually a couple MB) and general purpose cpu registers
and then do the actual analysis only if you really want to print it (this is a more extreme version of the "defer the symbol lookup" trick RAOF mentioned)
this is how minidumps work, saving all the work for later. it was also apparently a classic Trick that sampling profilers did, since dumping the entire stack on every sample and processing it in the background was faster than doing a backtrace on the spot.
samply is however optimized enough that it's faster to unwind than do the dump-the-stack hack
@Gankra @RAOF @fasterthanlime @dysfun The unwinding part of samply is done by framehop: https://github.com/mstange/framehop
The tricky part about using it from backtrace-rs would probably be the detection of when libraries are loaded into / unloaded from the process. Or in other words, the tricky part about caching unwind rules is knowing when to invalidate the cache. I don't know how libunwind does that part.
@fasterthanlime Stuff like this is exactly what I meant here: https://github.com/rust-lang/rfcs/pull/2154#issuecomment-1753333675
Debuginfo has a chicken-egg problem where nobody has an incentive to really optimize it (meaning the entire ecosystem including unwinder, symbolication, managing symbol files, CI, etc) because nobody wants to use it in prod because it's slow. But maybe stuff like framehop (used by samply) is going to change this one day?