Mastodawn

amos Aug 3, 2024

@dysfun right but couldn't you do the cheap thing when _capturing_ the backtrace and keep the expensive stuff for later? I guess not?

Show thread

RAOF Aug 3, 2024

@fasterthanlime @dysfun Polar Signals has done a bunch of work to make stack traces fast (kinda a necessity for an enable-in-production profiling tool).

If I remember correctly, they implement¹ just enough DWARF interpreter to traverse the call stack - at the cost of sometimes failing to produce a stack trace. backtrace-rs could probably do something similar? It doesn't need to unwind the stack - it doesn't need to find all the locals and run their destructors - it just needs to find the return address. Once you've saved all the return address you can defer symbol lookup to later (possibly even on another machine, or requiring hitting the network for debuginfod symbols).

¹: in eBPF, obviously, because they're running from kernel context

Debug Daily. Optimize Always | Polar Signals

Polar Signals Cloud is an always-on, zero-instrumentation continuous profiling product that helps improve performance, understand incidents, and lower infrastructure costs.

Debug Daily. Optimize Always | Polar Signals

Show thread

Aria Desires

@RAOF @fasterthanlime @dysfun yes you only have to do some of the work when doing backtraces instead of proper unwinding

there's a lot of tradeoffs here in terms of correctness/speed/detail. if i wanted to dig into optimizing "oops all backtraces" in rust i would be looking into what samply does, which is specifically designed to cache and optimize the relevant data, and is all rust code

https://github.com/mstange/samply/

GitHub - mstange/samply: Command-line sampling profiler for macOS, Linux, and Windows

Command-line sampling profiler for macOS, Linux, and Windows - mstange/samply

GitHub

Show thread

Aria Desires Aug 3, 2024

@RAOF @fasterthanlime @dysfun

you can also "avoid" the work of backtracing by just saving the entire stack (usually a couple MB) and general purpose cpu registers

and then do the actual analysis only if you really want to print it (this is a more extreme version of the "defer the symbol lookup" trick RAOF mentioned)

this is how minidumps work, saving all the work for later. it was also apparently a classic Trick that sampling profilers did, since dumping the entire stack on every sample and processing it in the background was faster than doing a backtrace on the spot.

samply is however optimized enough that it's faster to unwind than do the dump-the-stack hack

Show thread

amos Aug 3, 2024

@Gankra @RAOF @dysfun fascinating!

that would do great as an eyre handler (with the required adaptation work)

Show thread

Markus Stange Aug 4, 2024

@Gankra @RAOF @fasterthanlime @dysfun The unwinding part of samply is done by framehop: https://github.com/mstange/framehop

The tricky part about using it from backtrace-rs would probably be the detection of when libraries are loaded into / unloaded from the process. Or in other words, the tricky part about caching unwind rules is knowing when to invalidate the cache. I don't know how libunwind does that part.

GitHub - mstange/framehop: Stack unwinding library in Rust

Stack unwinding library in Rust. Contribute to mstange/framehop development by creating an account on GitHub.

GitHub

Show thread

Markus Stange Aug 4, 2024

@Gankra @RAOF @fasterthanlime @dysfun Framepointer-based unwinding has the advantage that you don't need to know anything about which libraries are loaded and where to find their unwind info. So it's much easier to manage, in addition to the faster unwinding speed.