🎉 Oh joy, another thrilling journey into the depths of #CUDA #intrinsics, as our brave author heroically tackles the burning issue of sorting faster! 🚀 Because who wouldn't want a detailed dissertation on an #algorithm called "bitonic sort" that promises to revolutionize... absolutely nothing in your daily life. 😅
https://winwang.blog/posts/bitonic-sort/ #BitonicSort #TechJourney #SortingAlgorithms #HackerNews #ngated
Faster sorting with SIMD CUDA intrinsics

Full code on Github: https://github.com/wiwa/blog-code/ Hi Link to heading Recently, I finished a batch at the Recurse Center… is what I would have said if this post were written when I intended to write it (i.e. 3 months ago). My project there focused on a questionable application of CUDA (mostly irrelevant to this post), but it got me thinking more about other GPU-friendly algorithms. Instead of my Recurse project (which I hope to write about in a later post), I want to simply begin writing about technical stuff I’ve played around with.

Faster sorting with SIMD CUDA intrinsics

Full code on Github: https://github.com/wiwa/blog-code/ Hi Link to heading Recently, I finished a batch at the Recurse Center… is what I would have said if this post were written when I intended to write it (i.e. 3 months ago). My project there focused on a questionable application of CUDA (mostly irrelevant to this post), but it got me thinking more about other GPU-friendly algorithms. Instead of my Recurse project (which I hope to write about in a later post), I want to simply begin writing about technical stuff I’ve played around with.

Since .NET is a high level language it's not always possible to be sure what underlying instructions will be generated by the JIT.

Hardware intrinsics in .NET allow to access hardware specific instructions if we want to trade general-purpose of our code for
speed of specific hardware-backed instructions. In .NET those instructions and types are access from System.Runtime.Intrinsics and child namespaces.

Hardware intrinsics supports a variety of instructions. With one instruction, if our processor supports it, we can calculate the number of 1s with POPCNT in the value or perform our round od AES with _mm_aesdec_si128 backed by the hardware. And the list doesn't end there. Check the link for more instructions.

Docs 📑: https://learn.microsoft.com/en-us/dotnet/api/system.runtime.intrinsics?view=net-8.0

#dotnet #intrinsics #cpu #performance
---
If you find this useful, consider giving a like & share ❤.

System.Runtime.Intrinsics Namespace

Contains types used to create and convey register states in various sizes and formats for use with instruction-set extensions. For the instructions to manipulate these registers, see System.Runtime.Intrinsics.X86 and System.Runtime.Intrinsics.Arm.

Neben der Stabilisierung intrinsischer Funktionen für die wasm32-Plattform bringt das Release Ergänzungen für Makros mit.
Programmiersprache: Rust 1.54 erweitert die Anbindung an WebAssembly
Programmiersprache: Rust 1.54 erweitert die Anbindung an WebAssembly

Neben der Stabilisierung intrinsischer Funktionen für die wasm32-Plattform bringt das Release Ergänzungen für Makros mit.