You'd think there's a million ready-to-use "sort some things, on the GPU" codes out there, but outside of CUDA there isn't all that much. The ones that work, that is.

So I spent way too much time debugging sorting of a bunch of points! As a side effect though, Unity HDRP GPU sorting utility got a fix pull request :) https://github.com/Unity-Technologies/Graphics/pull/7954

Improve robustness of GPUSort wrt out of bounds buffer reads/writes by aras-p · Pull Request #7954 · Unity-Technologies/Graphics

Purpose of this PR GPUSort utility (which seems to be used for the "software" Line Rendering system in HDRP) in some cases produces incorrect results, at least on Metal (most notably, when the size...

GitHub
@aras honestly also good compute code lacks in the wild because is hard to write GPU compute without a giant infrastructure framework. Unity is not lean, unreal is crazy, frameworks are all c++ and CUDA shrinks your options.
This is why I've written coalpy, so people can write hlsl and tie it with simple python bindings. It includes also imgui integrated support, live editing, window management and can use a vulkan backend. The goal is to democratize real hlsl compute
https://coalpy.org
Home

Compute Abstraction Layer for Python.

CoalPy