I want to see if the thoughts of others align with mine.

An FPGA toolchain should optimise for:
highest worst-case Fmax
61.4%
highest average Fmax
27.3%
highest best-case Fmax
11.4%
Poll ended at .
@lofty Depends how many parallel jobs I am allowed to submit to get that golden seed 😅
@wren6991 I'm computing distributions of 100 runs, so

@lofty If I'm comparing RTL changes for fmax then I'll usually do ~100 runs and take the median. Seems reasonable because 8 runs aiming for median frequency should have only a 0.4% chance of all failing. Most client machines have 8 parallel threads of execution available these days.

I don't particularly care about the mean because if I get a worst-case result then you can bet I'm re-rolling, so the tail values don't matter.

I don't think it's *good* that seed sweeping is the optimal way of using the tools, but the fact is it reduces variance as well as increasing expected fmax.

@lofty Also bear in mind my comparison point here is Vivado which has a >> 0.4% chance of just segfaulting for no apparent reason, so I find that kind of probability acceptable
@wren6991 @lofty one trick i've seen for dealing with fpga toolchains that i really like is to LD_PRELOAD this bad boy. makes them much less flakey.

@mei @lofty @wren6991 I'm not sure whether to laugh or run screaming in the other direction.

This is one of the most horrifying software bodges I've seen in a long time.

@azonenberg @mei @lofty @wren6991 It’s horrifying but only because they’re having to do this to released software. Otherwise, yeah I have done stuff like this, if rarely this elegantly. This same thing would be a tool for finding these errors if you add code to fill freed blocks with provocatively bad data. Dynamic linking games are always fun.

@acsawdey @azonenberg @mei @lofty > This same thing would be a tool for finding these errors if you add code to fill freed blocks with provocatively bad data

Isn't this what asan does with quarantining?

@wren6991 @acsawdey @mei @lofty yeah, or visual studio's debug allocator that fills newly allocated uninitialized values with 0xcc etc

@wren6991 @azonenberg @mei @lofty yeah there are lots of things that do this, sometimes you have to roll your own .. like for instance if you can’t rebuild with asan.

I think it’s axiomatic that useful programming tools get reimplemented all the time.

@acsawdey @wren6991 @mei @lofty yeah it's more the horror of "let's just double all allocations and add a fudge factor in the expectation that the thing is full of buffer overflows" and "delay all frees in the expectation that it's full of UaFs".

Vivado leaks memory bad enough as is. I'd want another 128GB of RAM if I was using this...

@azonenberg @acsawdey @mei @lofty It's fine, it's just virtual memory that's being allocated. The padding has low cost if there are no overruns :) :) :)
@wren6991 @azonenberg @mei @lofty … and if the blocks being allocated are >> pagesize

@wren6991 @azonenberg @acsawdey @mei @lofty I have in fact fed 128GB chunk after 128GB chunk of swap to a tool that ate all 64GB of ram and then lots more. (I soon had to switch to 256GB chunks) I don't remember how many I carved out of my SSD in the end, but I know it was only a 1TB SSD and I don't think I pulled any from spinning rust. It did eventually finish though. :P

And this wasn't even a Vendor tool, it was "research grade" tooling written in python.

@azonenberg @mei @lofty @wren6991 I haven't seen this horrifying bodge in a long time, because it was deployed to catch a naughty Windows device driver which wrote back to user space memory after the driver handle was closed. The driver was eventually fixed.
@azonenberg @mei @lofty @wren6991 this sort of software bodge is literally built into the Windows compatibility shim and applied for a metric crapload of programs. we certainly live in times
@astraleureka @mei @lofty @wren6991 this exact "work around horrid memory safety issues" bodge?
@azonenberg @mei @lofty @wren6991 yep, oversized allocations and time-delayed free. the name "enterprise-malloc" is so incredibly accurate it hurts. the windows compat shim often gets tuned to limit the over-allocation size since it's a per-program profile, so it's not as noticeable
@astraleureka @mei @lofty @wren6991 the only thing worse than this shitpost is it not being a shitpost
@azonenberg @mei @lofty @wren6991 fpga can have a little use after free, as a treat
@mei @lofty This is a work of art. They should teach this in all systems programming courses
@mei @lofty this is both cursed and beautiful
@nullenvk @mei @lofty
So this is the legendary preload library that mwk told us students about 6 years ago?
@wolf480pl @mei @lofty legendary preload library?

@nullenvk @lofty
when I was taking the Adanced Operating Systems class by @mwk, she once told a story how ISE had use-after-free bugs and in order to make it work, she had to use an LD_PRELOAD library that remembered what pointers ISE tried to free, and freed them a few calls later.

Which is part of what the code @mei posted does

@mei @lofty @wren6991 Woah... That is... Something special.

So it deals with use-after-free, miscalculated memory sizes as well as buffer overruns?

@loke @mei @lofty I bet that serialising all the free/malloc calls also hides some races
@wren6991 @loke @mei @lofty about that, I believe the orig_malloc should happen before the pthread_mutex_unlock... Or did I miss something?
@f4grx @loke @mei @lofty Oh yeah, you're right. It doesn't serialise the calls
@mei @wren6991 Isn’t there also UB in this code? There isn’t a capacity of the queue and therefore it does free non allocated memory which is technically UB. Also the position isn’t initialized and it could be out of bounds on the first free call.
@TheAlgorythm @mei C guarantees implicit zero initialisation of variables with static storage duration, though it's bad style to rely on it because the same does not apply to automatic storage duration. So the free of empty queue is avoided by the null check, and the position is initialised to zero.
@mei @wren6991 Oh, ok. Didn’t know the exceptions of static.
@TheAlgorythm @mei They're static so 0 on first call. Calling free on NULL is defined as no-op.
zSchön :disability_flag: (@[email protected])

@[email protected] @[email protected] Oh, ok. Didn’t know the exceptions of static.

chaos.social
@mei @lofty @wren6991 I love how the best part of the joke is at the very end. Thank you for this, I really enjoyed it! And of course I am very very scared to know that this makes software run better. It's an approach I've never considered before and it's ... dreadful. 🤣

@mei @lofty @wren6991 Simpler version:

void free(void *p) {}

@dalias @mei @lofty @wren6991 you laugh, but we've legitimately used this implementation before

underrated memory management technique for "small" batch-processing programs imo

@r @dalias @mei @lofty @wren6991 iirc, cvstrac by Richard Hipp (sqlite creator) did this, so that each forked web request handler process would have one big memory freeing event at exit time. (To be fair, I'm not sure if he overrode "free", or just didn't call it.)
@mei @wren6991 @lofty impressive tho I fear for the quality of the code that this makes more reliable. LD_PRELOAD is a great power when you need it. I used it once to patch up code looking for 32 bit inodes on a system with 64 bit inodes, and to wrap ETL pipelines on one giant NFS so I could run them in a read only mode to build the run time call graph o allow splitting the giant NFS into one container per scheduled job.

@mei @lofty @wren6991

As somebody who once wrote a malloc implementation seeing that makes me want to farm carrots instead...

@mei @lofty @wren6991 on one end I'm like "niiiice, but well, this can be improved, the queue can be made lock free, and we can drop the mutex from the initialization of the forwarded symbols through careful use of atomics; also, we may improve resilience by adjusting the return value to have some extra slack even at the start"; on the other hand I'm like NO THIS IS BAD LET IT BE SLOW
@cvtsi2sd serialising all the calls to free might also be helping with race conditions
@mei @cvtsi2sd I don't think libc free() needs an extra protection against race conditions.
@mtrojnar @cvtsi2sd no, but making free() always take a mutex has some potential of hiding race conditions in the callers of free()
@mei @cvtsi2sd there should be a comment describing this

[MaeIsBad requested changes]
@tubetime @wren6991 @lofty @mei Alternately (or perhaps as a second level of bumper bowling), use __attribute__((constructor)) to install a SIGSEGV handler to just quietly "make the problem go away" a la https://people.csail.mit.edu/rinard/paper/osdi04.pdf (might also need to shim signal(2) and sigaction(2) to make sure it stays in place in case the Serious Enterprise Software tries to install its own crash handlers).
@mei Disgusting, I love it!

@mei @lofty @wren6991

Does valgrind complain about memory leaks on exit or does it undercut valgrind?

@resuna @mei @lofty @wren6991 you could add an atexit to free outstanding chunks. Otherwise valgrind will call them leaks

@cliffordheath @mei @lofty @wren6991

Yeah that's what I was thinking. Just loop over the area and free everything that isn't null.

I was just wondering if there was some interaction between valgrind and LD_PRELOAD that obviated it.

@resuna @mei @lofty @wren6991 Pretty sure this is used in environments where none of the original application developers has heard of valgrind, let alone used it.

@mei

I like how this is called "enterprise malloc"

@mei @lofty @wren6991 It took me a moment to realize how bad this is. You get a boost because of the pure horrors this implies.
@mei @lofty @wren6991 sweet mercy, are UAFs in production toolchains *that common?*
@mei @lofty @wren6991 oh that is delightful and horrific
@mei @lofty @wren6991
I have Lattice Diamond + Questasim running on Debian 12 (as in right now I have them running). I had to delete the multiple copies of bundled libstdc++ to make them work rather than crashing with complaint about incompatible library versions.
@mei @lofty @wren6991 😂 so horrible and smart at the same time !