This redesign of #poser (for #swad) to offer a "multi-reactor" (with multiple #threads running each their own event loop) starts to give me severe headaches.

There is *still* a very rare data #race in the #lockfree #queue. I *think* I can spot it in the pseudo code from the paper I used[1], see screenshot. Have a look at lines E7 and E8. Suppose the thread executing this is suspended after E7 for a "very long time". Now, some dequeue operation from some other thread will eventually dequeue whatever "Q->Tail" was pointing to, and then free it after consumption. Our poor thread resumes, checks the pointer already read in E6 for NULL successfully, and then tries a CAS on tail->next in E9, which is unfortunately inside an object that doesn't exist any more .... If the CAS succeeds because at this memory location happens to be "zero" bytes, we corrupted some random other object that might now reside there. ๐Ÿคฏ

Please tell me whether I have an error in my thinking here. Can it be ....? ๐Ÿค”

Meanwhile, after fixing and improving lots of things, I checked the alternative implementation using #mutexes again, and surprise: Although it's still a bit slower, the difference is now very very small. And it has the clear advantage that it never crashes. ๐Ÿ™ˆ I'm seriously considering to drop all the lock-free #atomics stuff again and just go with mutexes.

[1] https://dl.acm.org/doi/10.1145/248052.248106

Okay, found confirmation that actually calling free leads to races in this algorithm:

https://stackoverflow.com/questions/40818465/explain-michael-scott-lock-free-queue-alorigthm

The #lockfree command #queue in #poser (for #swad) is finally fixed!

The original algorithm from [MS96] works fine *only* if the "free" function has some "magic" in place to defer freeing the object until no thread holds a reference any more ... and that magic is, well, left as an exercise to the reader. ๐Ÿ™ˆ

Doing more research, I found a few suggestions how to do that "magic", including for example #hazardpointers ... but they're known to cause quite some runtime overhead, so not really an option. I decided to implement some "shared object manager" based on the ideas from [WICBS18], which is kind of a "manually triggered garbage collector" in the end. And hey, it works! ๐Ÿฅณ
https://github.com/Zirias/poser/blob/master/src/lib/core/sharedobj.c

[MS96] https://dl.acm.org/doi/10.1145/248052.248106
[WICBS18] https://www.cs.rochester.edu/u/scott/papers/2018_PPoPP_IBR.pdf

#coding #c #c11 #atomics

poser/src/lib/core/sharedobj.c at master ยท Zirias/poser

POsix SERvices framework for C. Contribute to Zirias/poser development by creating an account on GitHub.

GitHub