zahirtezcan

@zahirtezcan@mastodon.gamedev.place
1 Followers
49 Following
249 Posts

Easy vectorized classification with z3 https://lemire.me/blog/2025/06/01/easy-vectorized-classification-with-z3/

We often need to quickly classify characters. For example, consider how the binary data th

Fast character classification with z3 – Daniel Lemire's blog

Microsoft’s Windows Subsystem for Linux is now open-source https://www.theverge.com/news/669286/microsoft-windows-subsystem-for-linux-open-source
Microsoft’s Windows Subsystem for Linux is now open-source

Microsoft is open-sourcing its Windows Subsystem for Linux. Developers will be able to download the code, build it from source, and add fixes or features.

The Verge
“ZLinq”, a Zero-Allocation LINQ Library for .NET - Yoshifumi Kawai - Medium

I’ve released ZLinq v1 last month! By building on structs and generics, it achieves zero allocations. It includes extensions like LINQ to Span, LINQ to SIMD, LINQ to Tree (FileSystem, JSON…

Medium

TIL you can pass `-###` to `clang` to make it print all the effective arguments and environment garbage, rather than trying to play guessing games with your build system debug tools (lol)

from @AaronBallman

Iterating through keys and values in C++ (with C++20 code) https://lemire.me/blog/2025/04/20/iterating-through-keys-and-values-in-c-with-c20-code/

In software, we often use key-value data structures, where each key is unique and maps to

Streamlined iteration: exploring keys and values in C++20 – Daniel Lemire's blog

Detect control characters, quotes and backslashes efficiently using ‘SWAR’ https://lemire.me/blog/2025/04/13/detect-control-characters-quotes-and-backslashes-efficiently-using-swar/

When trying to write fast functions operating over many bytes, we sometimes use 'SWAR'. SW

Detect control characters, quotes and backslashes efficiently using ‘SWAR’ – Daniel Lemire's blog

How can really smart people appear totally incompetent? https://lemire.me/blog/2025/04/11/how-can-really-smart-people-appear-totally-incompetent/

It is often puzzling to encounter organizations run by highly capable and ambitious people

How can really smart people appear totally incompetent? – Daniel Lemire's blog

Faster shuffling in Go with batching – Daniel Lemire's blog

So I have spent the past 3-4 weeks to play with Vulkan/GLSL compute shaders and evaluate how much I could optimize a large matrix multiplications FP32 kernel. 🚀

I tested it both on an AMD RX 7600 XT and RTX 4070. 💻

Results have been enlightening and a bit disappointing at the same time. Let's dive in.
Overall the performance is **okish** compared to the state of the art:

- 🔴 AMD: 11.4 TFLOPS vs RocBLAS 2.7 TFLOPS (I believe they've a bug there)
- 🟢 Nvidia: 16.1 TFLOPS vs CUDA 20 TFLOPS

1/6

`make` and `cmake` have `ache` in common.

Coincidence?