I now implemented a per-thread #pool to reuse #timer objects in #poser (my lib I use for #swad).

The great news is: This improved performance, which is an unintended side effect (my goal was to reduce RAM usage 🙈😆). I tested with the #kqueue backend on #FreeBSD and sure, this makes sense: So far, I needed to keep a list of destroyed timers that's always checked to solve an interesting issue: By the time I cancel a timer with #kevent, the expiry event might already be queued, but not yet read by my event loop. Trying to fire events from a timer that doesn't exist any more would segtfault of course. Not necessary any more with the pool approach, the timer WILL exist and I can just check whether it's "alive".

The result? Same hardware as always, and now swad reaches a throughput of 26000 requests per second with (almost) perfect response times. 🥳

I'm still not happy with memory usage. It's better, and I have no explanation for what I oberved now:

Ran the same test 3 times, 1000 #jmeter threads each simulating a distinct client running a loop for 2000 times doing one GET and one POST for a total of 4 million requests. After the first time, the resident set was at 178MiB. After the second time, 245 MiB. And after the third time, well, 245 MiB. How ...? 🤯

Also, there's another weird observation I have no explanation for. My main thread delegates accepted connections to worker threads simply "round robin". And each time I run the jmeter test, all these worker threads show increasing CPU usage at a similar rate, until suddenly, one single thread seems to do "more work", which stabilizes when this thread is utilizing almost double the CPU as all other worker threads. And when I run the jmeter test again (NOT restarting swad), the same happens again, but this time, it's a *different* thread that "works" a lot more than all others.

I wonder whether I should accept scheduling, memory management etc. pp are all "black magic" and swad is probably "good enough" as is right now. 😆

I guess this funny looking graph showing response time percentiles is exactly the result of one of 8 service worker threads having a lot more to do than all others. I wonder whether this could be a behavioral artifact of the #lockfree #queue used to distribute the accepted connections. 🤔