I've been wanting to start a blog for a while, and finally decided to bite the bullet.

The first article of hopefully many more to come is about, you guessed it, profiling & optimization.

Boosts appreciated!

https://rovarma.com/articles/optimizing-libdwarf-eh-frame-enumeration/

Optimizing libdwarf .eh_frame enumeration | Ritesh Oedayrajsingh Varma

For the Linux version of Superluminal we rely on unwind information stored in the .eh_frame section in a binary to perform stack unwinding. We’ll go over optimizations we made to libdwarf that greatly improve the performance of retrieving this information.

@rovarma Very nice! Yay more blogs on performance optimization stories. And good to see not only the fixes, but also upstream PRs, ❤️
@aras Partially inspired by our conversation over lunch a while
back, so, thanks! :-)
@rovarma What website framework are you using, if any? Every time I started looking into it,I never found one that would click with me and usually gave up.

@Themikina I’m using https://astro.build/ with some customization. I like that it allows you to keep articles in markdown, but it’s not the only static site generator that can do that. I mainly ended up with it because there was a template available for it that I liked.

In previous years I’ve gone through the typical “I’ll build it myself” phase multiple times, but never got to the writing phase.

My advice to my past self would be: don’t overthink it, it just spits out html in the end :P

Astro

Astro builds fast content sites, powerful web applications, dynamic server APIs, and everything in-between.

Astro
@rovarma Looks nice, indeed! The embedded svg icons are also part of the template? And those page transitions really look neat!
@rovarma nice! It's almost like there is a gap in the market for a visual profiler for Linux that makes this stuff really obvious. Someone should probably make that; I'm sure your blogpost would be very useful to them! ;)
@rovarma this is fantastic! Also added to my RSS reader 😇
@rovarma is this eh-frame info also available when exceptions are disabled? and you have written that the info is ~3MB in your case, how big is the file in total, so how is the ratio between code and this info? but great article about low-level stuff, looking forward for the next one ;)

@questor It's not 3MB of info; there are 3,373,919 DWARF bytecode *instructions*. The actual size of the binary in this case is ~91MiB.

The .eh_frame info is emitted by default, yes, even if the language doesn't support exceptions/exceptions are disabled. This is because the exception unwinding machinery needs to support unwinding *through* code that doesn't support exceptions (consider, for example, C code calling into C++ code via a callback).

@questor There *are* flags to prevent the .eh_frame info from being emitted, but you have to go out of your way to do so. By default they'll be there in both debug & release builds.

@rovarma eeek, I seem to miss some basis to be able to read through 😅

If you don't have a next topic already planned, you could write a crash course for people like me!
I'll read the next post anyway tho, so great that you've started this! 💜

@iralmeida what did you struggle the most with? I think I know, but would be good to know for sure :-)

@rovarma I'm not too sure actually!

Most of the things (dwarf, rax etc) I have heard about before, the eh_frame I hadn't, so I learned sth! :D
I got some more understanding from the linked article with "more details", so I think I may be too fuzzy about stack frames, the structure of binaries and how do debug symbols really work. Then I'm trying to decode rather than reading the articles because I'm trying to piece together the context/domain of the optimization.
Does it match your guess?

@rovarma I spent the day revisiting how computers work, it was too fuzzy! 🙈

Thankfully I got @brendangregg 's Systems Performance book to the rescue (thx!) plus some intel processor manual. so now I got +1 wisdom for stack frames and traces 🥳

But I got to say, I find it really complicated regarding differences between architectures and the frame pointer being optional and the different methods for stack walking.

@rovarma I didn't get into DWARF and never worked with C++ exceptions, so I haven't put much thought to the runtime machinery it needs. 🤷‍♀️

That said, I tried reading the article again. And success!

There is no need to understand a stack frame and why the bytecode is like that, since you give it as a given that Superluminal's sampling relies on the data as is. It's about doing it faster, not different and the optz story and thought process are super nice to follow!

@rovarma Awesome work with the PRs and awesome wizardry in knowing how to optimize this kind of stuff! 🧙✨

@iralmeida

> Does it match your guess?

Yep! It’s one of the things I was worried about.

I originally had a section at the start explaining the basics of stack frames, frame pointers etc. But a single section can’t really do it justice, it needs a full article by itself, so I scrapped it. The end result is indeed that some basic knowledge about stacks is needed :-(

Awesome that you read up on it and were able to follow it afterwards! Glad you liked it :-)

@rovarma post that section as a separate article! :D

and well... thank you for getting me to go read up on stuff I should know 💜