You'd think there's a million ready-to-use "sort some things, on the GPU" codes out there, but outside of CUDA there isn't all that much. The ones that work, that is.

So I spent way too much time debugging sorting of a bunch of points! As a side effect though, Unity HDRP GPU sorting utility got a fix pull request :) https://github.com/Unity-Technologies/Graphics/pull/7954

Improve robustness of GPUSort wrt out of bounds buffer reads/writes by aras-p · Pull Request #7954 · Unity-Technologies/Graphics

Purpose of this PR GPUSort utility (which seems to be used for the "software" Line Rendering system in HDRP) in some cases produces incorrect results, at least on Metal (most notably, when the size...

GitHub
@aras hey, I recently reimplemented bitonic sort on GPU (Shadertoy); what kind of sort were you after?
@webanck "any", for starters. As can be seen from the github PR link, right now I'm using the (bitonic) sort from Unity's HDRP. I just had to fix it first, since it was producing incorrect results in some cases lol :)
@aras I see, thanks for the link to the blog post (https://poniesandlight.co.uk/reflect/bitonic_merge_sort/) on your PR, didn't find that when implementing from GPU gems and wikipedia's descriptions. Could be cleaner but this was my result on Shadertoy: https://www.shadertoy.com/view/DtjyWz.
Implementing Bitonic Merge Sort in Vulkan Compute

In which I describe how to implement bitonic sorting networks in compute shaders