Mastodawn

Luke Wren Sep 19, 2025

@lofty Depends how many parallel jobs I am allowed to submit to get that golden seed 😅

big awoo notation Sep 19, 2025

@wren6991 I'm computing distributions of 100 runs, so

@lofty If I'm comparing RTL changes for fmax then I'll usually do ~100 runs and take the median. Seems reasonable because 8 runs aiming for median frequency should have only a 0.4% chance of all failing. Most client machines have 8 parallel threads of execution available these days.

I don't particularly care about the mean because if I get a worst-case result then you can bet I'm re-rolling, so the tail values don't matter.

I don't think it's *good* that seed sweeping is the optimal way of using the tools, but the fact is it reduces variance as well as increasing expected fmax.

@lofty Also bear in mind my comparison point here is Vivado which has a >> 0.4% chance of just segfaulting for no apparent reason, so I find that kind of probability acceptable

lewurm Sep 21, 2025

mei

@wren6991 @lofty one trick i've seen for dealing with fpga toolchains that i really like is to LD_PRELOAD this bad boy. makes them much less flakey.

@mei @lofty @wren6991 I'm not sure whether to laugh or run screaming in the other direction.

This is one of the most horrifying software bodges I've seen in a long time.

Aaron Sawdey, Ph.D.Sep 20, 2025

@azonenberg @mei @lofty @wren6991 It’s horrifying but only because they’re having to do this to released software. Otherwise, yeah I have done stuff like this, if rarely this elegantly. This same thing would be a tool for finding these errors if you add code to fill freed blocks with provocatively bad data. Dynamic linking games are always fun.

@acsawdey @azonenberg @mei @lofty > This same thing would be a tool for finding these errors if you add code to fill freed blocks with provocatively bad data

Isn't this what asan does with quarantining?

@wren6991 @acsawdey @mei @lofty yeah, or visual studio's debug allocator that fills newly allocated uninitialized values with 0xcc etc

Aaron Sawdey, Ph.D.Sep 20, 2025

@wren6991 @azonenberg @mei @lofty yeah there are lots of things that do this, sometimes you have to roll your own .. like for instance if you can’t rebuild with asan.

I think it’s axiomatic that useful programming tools get reimplemented all the time.

@acsawdey @wren6991 @mei @lofty yeah it's more the horror of "let's just double all allocations and add a fudge factor in the expectation that the thing is full of buffer overflows" and "delay all frees in the expectation that it's full of UaFs".

Vivado leaks memory bad enough as is. I'd want another 128GB of RAM if I was using this...

@azonenberg @acsawdey @mei @lofty It's fine, it's just virtual memory that's being allocated. The padding has low cost if there are no overruns :) :) :)

Aaron Sawdey, Ph.D.Sep 20, 2025

@wren6991 @azonenberg @mei @lofty … and if the blocks being allocated are >> pagesize

ShutterBugged Sep 20, 2025

@wren6991 @azonenberg @acsawdey @mei @lofty I have in fact fed 128GB chunk after 128GB chunk of swap to a tool that ate all 64GB of ram and then lots more. (I soon had to switch to 256GB chunks) I don't remember how many I carved out of my SSD in the end, but I know it was only a 1TB SSD and I don't think I pulled any from spinning rust. It did eventually finish though. :P

And this wasn't even a Vendor tool, it was "research grade" tooling written in python.

Tim 🛥️🌈Sep 20, 2025

@azonenberg @mei @lofty @wren6991 I haven't seen this horrifying bodge in a long time, because it was deployed to catch a naughty Windows device driver which wrote back to user space memory after the driver handle was closed. The driver was eventually fixed.

the vessel of morganna Sep 20, 2025

@azonenberg @mei @lofty @wren6991 this sort of software bodge is literally built into the Windows compatibility shim and applied for a metric crapload of programs. we certainly live in times

@astraleureka @mei @lofty @wren6991 this exact "work around horrid memory safety issues" bodge?

the vessel of morganna Sep 20, 2025

@azonenberg @mei @lofty @wren6991 yep, oversized allocations and time-delayed free. the name "enterprise-malloc" is so incredibly accurate it hurts. the windows compat shim often gets tuned to limit the over-allocation size since it's a per-program profile, so it's not as noticeable

@astraleureka @mei @lofty @wren6991 the only thing worse than this shitpost is it not being a shitpost

F4GRX SÃ©bastien Sep 20, 2025

@azonenberg @mei @lofty @wren6991 same, this is cursed as heck lmao

Daniel Carosone Sep 21, 2025

@azonenberg @mei @lofty @wren6991 fpga can have a little use after free, as a treat

@mei @lofty This is a work of art. They should teach this in all systems programming courses

kasia

@mei @lofty this is both cursed and beautiful

Wolf480pl Sep 20, 2025

@nullenvk @mei @lofty
So this is the legendary preload library that mwk told us students about 6 years ago?

big awoo notation Sep 20, 2025

@wolf480pl quite likely.

kasia

@wolf480pl @mei @lofty legendary preload library?

Wolf480pl Sep 20, 2025

@nullenvk @lofty
when I was taking the Adanced Operating Systems class by @mwk, she once told a story how ISE had use-after-free bugs and in order to make it work, she had to use an LD_PRELOAD library that remembered what pointers ISE tried to free, and freed them a few calls later.

Which is part of what the code @mei posted does

kasia

@wolf480pl @lofty @mwk @mei lmao, that's amazing

Elias Mårtenson Sep 20, 2025

@mei @lofty @wren6991 Woah... That is... Something special.

So it deals with use-after-free, miscalculated memory sizes as well as buffer overruns?

@loke @mei @lofty I bet that serialising all the free/malloc calls also hides some races

F4GRX SÃ©bastien Sep 20, 2025

@wren6991 @loke @mei @lofty about that, I believe the orig_malloc should happen before the pthread_mutex_unlock... Or did I miss something?

@f4grx @loke @mei @lofty Oh yeah, you're right. It doesn't serialise the calls

zSchön

@mei @wren6991 Isn’t there also UB in this code? There isn’t a capacity of the queue and therefore it does free non allocated memory which is technically UB. Also the position isn’t initialized and it could be out of bounds on the first free call.

@TheAlgorythm @mei C guarantees implicit zero initialisation of variables with static storage duration, though it's bad style to rely on it because the same does not apply to automatic storage duration. So the free of empty queue is avoided by the null check, and the position is initialised to zero.

zSchön

@mei @wren6991 Oh, ok. Didn’t know the exceptions of static.

AMS Sep 20, 2025

@TheAlgorythm @mei They're static so 0 on first call. Calling free on NULL is defined as no-op.

zSchön

@AMS https://chaos.social/@TheAlgorythm/115237233115905861

zSchön :disability_flag: (@[email protected])

@[email protected] @[email protected] Oh, ok. Didn’t know the exceptions of static.

chaos.social

Peter H. Fröhlich Sep 20, 2025

@mei @lofty @wren6991 I love how the best part of the joke is at the very end. Thank you for this, I really enjoyed it! And of course I am very very scared to know that this makes software run better. It's an approach I've never considered before and it's ... dreadful. 🤣

Cassandrich Sep 20, 2025

@mei @lofty @wren6991 Simpler version:

void free(void *p) {}

F4GRX SÃ©bastien Sep 20, 2025

@dalias @mei @lofty @wren6991 this one is missile malloc, lmaoool

R Sep 20, 2025

@dalias @mei @lofty @wren6991 you laugh, but we've legitimately used this implementation before

underrated memory management technique for "small" batch-processing programs imo

Steve Purcell Sep 21, 2025

@r @dalias @mei @lofty @wren6991 iirc, cvstrac by Richard Hipp (sqlite creator) did this, so that each forked web request handler process would have one big memory freeing event at exit time. (To be fair, I'm not sure if he overrode "free", or just didn't call it.)

Chris L Sep 20, 2025

@mei @wren6991 @lofty impressive tho I fear for the quality of the code that this makes more reliable. LD_PRELOAD is a great power when you need it. I used it once to patch up code looking for 32 bit inodes on a system with 64 bit inodes, and to wrap ETL pipelines on one giant NFS so I could run them in a read only mode to build the run time call graph o allow splitting the giant NFS into one container per scheduled job.

Poul-Henning Kamp Sep 20, 2025

@mei @lofty @wren6991

As somebody who once wrote a malloc implementation seeing that makes me want to farm carrots instead...

Matteꙮ Italia Sep 20, 2025

@mei @lofty @wren6991 on one end I'm like "niiiice, but well, this can be improved, the queue can be made lock free, and we can drop the mutex from the initialization of the forwarded symbols through careful use of atomics; also, we may improve resilience by adjusting the return value to have some extra slack even at the start"; on the other hand I'm like NO THIS IS BAD LET IT BE SLOW

mei Sep 21, 2025

@cvtsi2sd serialising all the calls to free might also be helping with race conditions

Michał Trojnara

Sep 23, 2025

@mei @cvtsi2sd I don't think libc free() needs an extra protection against race conditions.

mei Sep 23, 2025

@mtrojnar @cvtsi2sd no, but making free() always take a mutex has some potential of hiding race conditions in the callers of free()

Mae Sep 23, 2025

@mei @cvtsi2sd there should be a comment describing this

[MaeIsBad requested changes]

zev Sep 20, 2025

@tubetime @wren6991 @lofty @mei Alternately (or perhaps as a second level of bumper bowling), use __attribute__((constructor)) to install a SIGSEGV handler to just quietly "make the problem go away" a la https://people.csail.mit.edu/rinard/paper/osdi04.pdf (might also need to shim signal(2) and sigaction(2) to make sure it stays in place in case the Serious Enterprise Software tries to install its own crash handlers).

Rachel Barker Sep 20, 2025

@mei Disgusting, I love it!

Resuna Sep 20, 2025

@mei @lofty @wren6991

Does valgrind complain about memory leaks on exit or does it undercut valgrind?

cliffordheath Sep 20, 2025

@resuna @mei @lofty @wren6991 you could add an atexit to free outstanding chunks. Otherwise valgrind will call them leaks

@cliffordheath @mei @lofty @wren6991

Resuna Sep 20, 2025

Yeah that's what I was thinking. Just loop over the area and free everything that isn't null.

I was just wondering if there was some interaction between valgrind and LD_PRELOAD that obviated it.

Gus Sep 20, 2025

@resuna @mei @lofty @wren6991 Pretty sure this is used in environments where none of the original application developers has heard of valgrind, let alone used it.

clang-enby Sep 20, 2025

@mei

I like how this is called "enterprise malloc"

gudenau Sep 20, 2025

@mei @lofty @wren6991 It took me a moment to realize how bad this is. You get a boost because of the pure horrors this implies.

endrift 🏳️‍⚧️Sep 20, 2025

@mei @wren6991 @lofty enterprise_malloc.c is killing me

Vincent Sparks 🔜 Furlingame Sep 20, 2025

@mei @lofty @wren6991 sweet mercy, are UAFs in production toolchains *that common?*

Irenes (many)Sep 21, 2025

@mei @lofty @wren6991 oh that is delightful and horrific

bigblen Sep 21, 2025

@mei @lofty @wren6991
I have Lattice Diamond + Questasim running on Debian 12 (as in right now I have them running). I had to delete the multiple copies of bundled libstdc++ to make them work rather than crashing with complaint about incompatible library versions.

PsySal Sep 21, 2025

@mei @lofty @wren6991 hahaha wow!

Urix Turing Sep 21, 2025

@mei this is wild 😂