Beyond OpenMP in C++ & Rust: Taskflow, Rayon, Fork Union ๐Ÿด

TL;DR: Most C++โ€ฏandโ€ฏRust thread-pool libraries leave significant performance on the table - often running 10ร— slower than OpenMP on classic fork-join workloads and micro-benchmarks. So Iโ€™ve drafted a minimal ~300-line library called Fork Union that lands within 20% of OpenMP. It does not use advanced NUMA tricks; it uses only the C++ and Rust standard libraries and has no other dependencies. OpenMP has been the industry workhorse for coarse-grain parallelism in C and C++ for decades. I lean on it heavily in projects like USearch, yet I avoid it in larger systems because:

Ash's Blog