🌘 I/O 多工處理:select、poll 與 epoll/kqueue 的演進與效能探討
➤ 從 select 的限制到 epoll/kqueue 的高效能事件驅動架構
https://nima101.github.io/io_multiplexing
本文深入探討了 Unix-like 系統中 I/O 多工處理的演進,從早期的 select 和 poll,到現代高效能的 epoll (Linux) 和 kqueue (macOS)。作者詳細闡述了 select 的 O(n) 時間複雜度和潛在的堆疊溢位風險,以及 poll 如何克服 fd 數量限制但仍存在 O(n) 的效能瓶頸。接著,文章著重介紹了 epoll 和 kqueue 的設計原理,說明它們如何透過事件註冊與回調機制,大幅提升系統的擴展性和效率,特別適用於處理大量併發連線的場景。
+ select 的堆疊問題實在太可怕了!還好有 poll 和 epoll/kqueue 來解決。
+ 這篇文章對
#C #select #poll #epoll #kqueue #網路 #學習 #Linux #kernel #IO 多工
I/O Multiplexing (select vs. poll vs. epoll/kqueue)

Problems and Algorithms

Problems and Algorithms
I/O Multiplexing (select vs. poll vs. epoll/kqueue)

Problems and Algorithms

Problems and Algorithms
Пример типичной системно-прикладной разработки, реализация своей «очереди сообщений» (event loop).
Использованием
#epoll, вместо «традиционных» select & poll, для асинхронной работы через polling’а (и мультиплексирования).

Отслеживание файловых дескрипторов через
#epoll выглядит более современно, меньше копирования памяти между user space и kernel space. А при появлении ожидаемых данных можно напрямую переходить к объекту или структуре данных, что важно при наблюдении за несколькими файлами и\или соединениями. Устраняется поиск «сработавшего» файлового дескриптора в индексных массивах, полноценное О(1) во всех случаях. Можно сразу же работать с теми экземплярами объектов, которые оборачивают тот файл или udp-поток, tcp-, quic-соединение, где появились новые данные.

Есть несколько готовых к использованию «очередей сообщений» (event loop'ов) —
#libev, #libuv, #libevent. Для некоторых агент-серверные решений и брокера #RabbitMQ это подходит. Однако, в некоторых случаях AMQP-библиотеки не скрещиваются с уже готовыми «очередями сообщений». Потому что агентская часть может активно использовать асинхронно-реактивное программирование с хорошей и проверенной «горизонтальной масштабируемостью». Т.е. на агентской части выполняется много работы и реализация сделана через sharing nothing многопоточность. Это такая парадигма, когда не просто достигается не только горизонтальная масштабируемость через lock-free\wait-free, а так же исключается много вредного, как тот же cache ping-pong или false sharing. Внутри агентов идёт своё управление потоками с выделениями памяти. Не только в плане «динамической памяти» (heap, аллокаторы а-ля #jemalloc от #Facebook), но и приколами вокруг pinning страниц, учёта #NUMA и даже huge pages(меньше промахов в #TLB).

Почему бы не использовать epoll?
Библиотека не обязана вычитывает данные целиком из потока (сокета), а может забирает данные лишь до тех пор, пока не насытится автомат состояний (finite-state machine). Например, выполняется парсинг сущностей AMQP-протокола, которые, по мере накопления, передаются в обработчики указанные клиентом библиотеки.
И это плохо соотносится с тем, что используя
#epoll надо выбирать какой вариант оповещений использовать:
• «по уровню» (level-triggered),
• «по фронту» (edge-triggered).

Особенности поведения отдельно взятой библиотеки может не позволять использовать работу «по фронту» (edge-triggered), т.к. библиотека не вычитывает полностью все данные из файловых дескрипторов.

Можно быть хоть пять раз technical lead и всё это прекрасно знать, но следует помнить, что как только в коде появляется флаг EPOLLET, то необходимо проводить аудит работы с потоками данных. Это избавляет команду от многих заморочек вокруг тестирования и ковыряния с каким-то совершенно непонятным поведением кода.

Про
«Edge Triggered Vs Level Triggered interrupts»

#programming #linux #softdev #трудовыебудни
Edge Triggered Vs Level Triggered interrupts

Discussion: Level triggered: as long as the IRQ line is asserted, you get an interrupt request. When you serve the interrupt and return, ...

Please help me spread the link to #swad 😎

https://github.com/Zirias/swad

I really need some users by now, for those two reasons:

* I'm at a point where I fully covered my own needs (the reasons I started coding this), and getting some users is the only way to learn about what other people might need
* The complexity "exploded" after supporting so many OS-specific APIs (like #kqueue, #epoll, #eventfd, #signalfd, #timerfd, #eventports) and several #lockfree implementations based on #atomics while still providing fallbacks for everything that *should* work on any #POSIX systems ... I'm definitely unable at this point to think of every possible edge case and test it. If there are #bugs left (which is somewhat likely), I really need people reporting these to me

Thanks! 🙃

GitHub - Zirias/swad: Simple Web Authentication Daemon

Simple Web Authentication Daemon. Contribute to Zirias/swad development by creating an account on GitHub.

GitHub

Today, it's exactly one month since I released #swad 0.11. And I'm slowly closing in on releasing 0.12.

The change to a "multi #reactor" design was massive. It pays off though. On the hardware that could reach a throughput of roughly 1000 requests per second, I can now support over 3000 r/s, and when disabling #TLS, 10 times as much. Most of the time, I spent with "detective work" to find the causes for a variety of crashes, and now I'm quite confident I found them all, at least on #FreeBSD with default options. As 0.11 still has a bug affecting for example the #epoll backend on #Linux, expect to see swad 0.12 released very very soon.

I'm still not perfectly happy with RAM consumption (although that could also be improved by explicitly NOT releasing some objects and reusing them instead), and there are other things that could be improved in the future, e.g. experiment with how to distribute incoming connections to the worker threads, so there's not one "loser" that always gets slowed down massively by all the others. Or design and implement alternative #JWT #signature algorithms besides #HS256 which could enable horizontal scaling via load balancing. Etc. But I think the improvements for now are enough for a release. 😉

Getting somewhat closer to releasing a new version of #swad. I now improved the functionality to execute something on a different worker thread: Use an in-memory queue, providing a #lockfree version. This gives me a consistent reliable throughput of 3000 requests/s (with outliers up to 4500 r/s) at an average response time of 350 - 400 ms (with TLS enabled). For waking up worker threads, I implemented different backends as well: kqueue, eventfd and event-ports, the fallback is still a self-pipe.

So, #portability here really means implement lots of different flavors of the same thing.

Looking at these startup logs, you can see that #kqueue (#FreeBSD and other BSDs) is really a "jack of all trades", being used for "everything" if available (and that's pretty awesome, it means one single #syscall per event loop iteration in the generic case). #illumos' (#Solaris) #eventports come somewhat close (but need a lot more syscalls as there's no "batch registering" and certain event types need to be re-registered every time they fired), they just can't do signals, but illumos offers Linux-compatible signalfd. Looking at #Linux, there's a "special case fd" for everything. 🙈 Plus #epoll also needs one syscall for each event to be registered. The "generic #POSIX" case without any of these interfaces is just added for completeness 😆

I just fixed a "horrible" bug in #swad:

https://github.com/Zirias/poser/commit/fcd8f4eb44d9676dde2546042b5fe3165aecc52c

In case you don't understand C: This potentially dereferenced "wild" and null pointers before the (copy-and-pasted 🙈) typo was fixed, which means it's "undefined behavior", so might do surprising things, but more likely crash.

It affects the #epoll (on #Linux) and #eventports (on #Solaris / #illumos) backends. A quick smoke test on these platforms was done in swad 0.11 and didn't show any unexpected behavior. Only after preparing for the next release (that hopefully has multiple parallel event loops) by moving some static service data to thread-local storage, it suddenly failed on illumos, that's how I tracked down that embarrasing crap. 😞

I hope to complete a new version soon enough, so I don't have to do a "bugfix release" for it.

Service: Fix bug finding watch parent · Zirias/poser@fcd8f4e

POsix SERvices framework for C. Contribute to Zirias/poser development by creating an account on GitHub.

GitHub

@jhx Regarding that, at least in theory, it's indeed "truly portable" as it works fine using only #POSIX compliant APIs.

In practice, there can be issues with platforms that don't implement the *full* POSIX feature-set (which is in fact most platforms nowadays). There can also be nasty issues with how feature-test macros are handled (set by the compiler, interpreted by the system's headers) and sometimes with which libraries are needed (unfortunately, POSIX doesn't specify that, e.g. on illumos, you have to link a libsocket for any socket functionality 🙄).

Once I started to add optional support for the platform-specific mechanisms #epoll on #Linux and #kqueue on #BSD (because the POSIX standard select and poll have severe scalability issues), I wanted to also add support for /dev/poll as used on solaris, that's why I installed #OpenIndiana (illumos-based) in a VM to do tests, and I quickly learned /dev/poll was superseded by "event ports", so that's what I added instead.

@jhx Hopefully 😎 I just verified it still builds and works on #illumos (#solaris descendant) as well.

A linux build without specific options should show using #epoll, #signalfd and #timerfd here 😉

Now that #swad 0.7 is released, it's time to prepare a new release of #poser, my own lib supporting #services on #POSIX systems, following a #reactor with #threadpool design.

During development of swad, I moved poser from using strictly only POSIX APIs (with the scalability limits of e.g. #select) to auto-detected support for #kqueue, #epoll, #eventports, #signalfd and #timerfd (so now it could, in theory(!), "compete" with e.g. libevent). I also fixed quite some hidden bugs, and added more base functionality, like a #dictionary using nested hashtables internally, or #async tasks mimicking the async/await pattern known from e.g, #csharp. I also deprecated two features, the periodic and global "service tick" (superseded by individual timers) and the "resolve hosts" property of a "connection" (superseded by a separate resolve class).

I'll have to decide on a few things, e.g. whether I'll remove the deprecated stuff immediately and bump the major version of the "posercore" lib. I guess I'll do just that. I'd also like to add all the web-specific stuff (http 1.0/1.1 server) that's currently part of the swad code as a "poserweb" lib. This would get a major version of 0, indicating a generally unstable API/ABI as of now....

And then, I'd have to decide where certain utility classes belong to. The rate limiter is probably useful for things other than web, so it should probably go to core. What about url encoding/decoding, for example? 🤔

Stay tuned, something will come here, maybe helping you to write a nice service in plain #C 😎:

https://github.com/Zirias/poser

GitHub - Zirias/poser: POsix SERvices framework for C

POsix SERvices framework for C. Contribute to Zirias/poser development by creating an account on GitHub.

GitHub