Mastodawn

RevK

2d ago

So today I have to figure out why @kicad v10 is instantly crashing… I’ll work on it a bit before reporting.

But I also have some of my C code randomly crashing, so it is going to be valgrind sort of day.

And #ESPIDF v6 refuses to work either, with some python crash. I hate Python.

Fun day ahead.

Show thread

RevK

2d ago

@kicad My code had a use after free, FFS, long standing random crashes finally tracked down - had to run a long valgrind checking apache logs to track it. #valgrind is awesome isn't.

You have to kick yourself over these a bit. A simple free(x) followed by using x is easy to spot and avoid.

A "get me this string from this XML object", and later "set this value in this XML object, which frees and reallocs and stores inside function" which causes previously fetched value to be now freed. Harder.

Show thread

Martin Uecker 2d ago

@revk @kicad Have you tried GCC's analyzer? It may catch some of those.

Show thread

RevK

2d ago

@uecker @kicad Interesting, no, I have not. Though gcc is damn good at spotting stuff (with all warning turned on) anyway.

Link?

Show thread

Martin Uecker 2d ago

@revk @kicad
It is not perfect and also increases compilation time, but it catches some stuff during compilation.
https://gcc.gnu.org/onlinedocs/gcc-15.2.0/gcc/Static-Analyzer-Options.html

Static Analyzer Options (Using the GNU Compiler Collection (GCC))

Show thread

RevK

2d ago

@uecker @kicad I will try it, and maybe add as default...

Wow, pretty

warning: use of possibly-NULL ‘FRAME.63.i’ where non-null expected [CWE-690] [-Wanalyzer-possible-null-argument]

OK that is impressive, it is based on a very long path after open_memstream returns NULL. Not going to happen in practice, but should have tested.

I appreciate the tip, this is something I should be doing definitely.

How did I miss that this existed. One error check for NULL open_memstream and it is now happy.

Show thread

RevK

2d ago

@uecker @kicad The logic trace was 264 lines long for this one error.

Tried on another program, and pretty much down to failure to check open_memstream can return NULL. But some others, so very useful.

FFS and yes, strdup could, in theory, return NULL, when computers did not have shit tons of RAM...

Could so with an "assume we won't run out of RAM" option 🙂

Show thread

RevK

2d ago

@uecker @kicad I suspect we need to use this to double check stuff, but so far the code I have checked only has extremely unlikely memory exhaustion related errors. Worth checking even if only to fatal abort, but not serious.

TBH we could do with compiler option to say "make all malloc/alloc failures fatal abort" these days. A lot of my code has a malloc wrapper that fatal aborts, but compiler makes use of some internally I think as do many system functions like open_memstream.

Show thread

RevK

2d ago

@uecker @kicad I am pretty sure this is an option on the ESP-IDF, and I probably have on by default.

However, I appreciate the tip, and will be using it.

Show thread

Martin Uecker 2d ago

@revk @kicad I agree, I also use such wrappers. I run the analyzer as part of CI, occasionally it catches sometimes but it is also often a false positive. I would be interested if it would have detected the use-after-free.

Show thread

Martin Uecker 2d ago

@revk @kicad You may also want to try with LTO.

Show thread

RevK

@uecker @kicad ? LTO?

Show thread

Martin Uecker 2d ago

@revk @kicad link-time optimization -flto

Show thread

RevK

2d ago

@uecker @kicad Ah, now, I think we have been working on that. Less useful maybe for generic linux code? I hope.

But for embedded code...

We have found that some link time optimisation related to placement of code on cache line boundaries and offsets can make an amazing difference in runtime performance. And improved build to build consistency.

To get close to understanding this we had to already have a huge test platform to performance test the code in many ways.

Yes, fun stuff.

Show thread

RevK

2d ago

@uecker @kicad So like the FB2900 can manage around 850Mb/s traffic, or more, but get the link wrong and it is 600Mb/s or worse.

Mental, and we found the default build could be very random between the two. Some releases were oddly slower.

So my engineers have been optimising LTO against test systems so we can consistently deliver towards the higher end.

Nice we have the option, to be honest.