I've been wanting to start a blog for a while, and finally decided to bite the bullet.

The first article of hopefully many more to come is about, you guessed it, profiling & optimization.

Boosts appreciated!

https://rovarma.com/articles/optimizing-libdwarf-eh-frame-enumeration/

Optimizing libdwarf .eh_frame enumeration | Ritesh Oedayrajsingh Varma

For the Linux version of Superluminal we rely on unwind information stored in the .eh_frame section in a binary to perform stack unwinding. We’ll go over optimizations we made to libdwarf that greatly improve the performance of retrieving this information.

@rovarma eeek, I seem to miss some basis to be able to read through 😅

If you don't have a next topic already planned, you could write a crash course for people like me!
I'll read the next post anyway tho, so great that you've started this! 💜

@iralmeida what did you struggle the most with? I think I know, but would be good to know for sure :-)

@rovarma I'm not too sure actually!

Most of the things (dwarf, rax etc) I have heard about before, the eh_frame I hadn't, so I learned sth! :D
I got some more understanding from the linked article with "more details", so I think I may be too fuzzy about stack frames, the structure of binaries and how do debug symbols really work. Then I'm trying to decode rather than reading the articles because I'm trying to piece together the context/domain of the optimization.
Does it match your guess?

@rovarma I spent the day revisiting how computers work, it was too fuzzy! 🙈

Thankfully I got @brendangregg 's Systems Performance book to the rescue (thx!) plus some intel processor manual. so now I got +1 wisdom for stack frames and traces 🥳

But I got to say, I find it really complicated regarding differences between architectures and the frame pointer being optional and the different methods for stack walking.

@rovarma I didn't get into DWARF and never worked with C++ exceptions, so I haven't put much thought to the runtime machinery it needs. 🤷‍♀️

That said, I tried reading the article again. And success!

There is no need to understand a stack frame and why the bytecode is like that, since you give it as a given that Superluminal's sampling relies on the data as is. It's about doing it faster, not different and the optz story and thought process are super nice to follow!

@rovarma Awesome work with the PRs and awesome wizardry in knowing how to optimize this kind of stuff! 🧙✨