Easy vectorized classification with z3 https://lemire.me/blog/2025/06/01/easy-vectorized-classification-with-z3/
We often need to quickly classify characters. For example, consider how the binary data th
Easy vectorized classification with z3 https://lemire.me/blog/2025/06/01/easy-vectorized-classification-with-z3/
We often need to quickly classify characters. For example, consider how the binary data th
“ZLinq”, a Zero-Allocation LINQ Library for .NET
https://neuecc.medium.com/zlinq-a-zero-allocation-linq-library-for-net-1bb0a3e5c749
TIL you can pass `-###` to `clang` to make it print all the effective arguments and environment garbage, rather than trying to play guessing games with your build system debug tools (lol)
from @AaronBallman
Iterating through keys and values in C++ (with C++20 code) https://lemire.me/blog/2025/04/20/iterating-through-keys-and-values-in-c-with-c20-code/
In software, we often use key-value data structures, where each key is unique and maps to
Detect control characters, quotes and backslashes efficiently using ‘SWAR’ https://lemire.me/blog/2025/04/13/detect-control-characters-quotes-and-backslashes-efficiently-using-swar/
When trying to write fast functions operating over many bytes, we sometimes use 'SWAR'. SW
How can really smart people appear totally incompetent? https://lemire.me/blog/2025/04/11/how-can-really-smart-people-appear-totally-incompetent/
It is often puzzling to encounter organizations run by highly capable and ambitious people
So I have spent the past 3-4 weeks to play with Vulkan/GLSL compute shaders and evaluate how much I could optimize a large matrix multiplications FP32 kernel. 🚀
I tested it both on an AMD RX 7600 XT and RTX 4070. 💻
Results have been enlightening and a bit disappointing at the same time. Let's dive in.
Overall the performance is **okish** compared to the state of the art:
- 🔴 AMD: 11.4 TFLOPS vs RocBLAS 2.7 TFLOPS (I believe they've a bug there)
- 🟢 Nvidia: 16.1 TFLOPS vs CUDA 20 TFLOPS
1/6
`make` and `cmake` have `ache` in common.
Coincidence?