Re-wrote my acceleration structure for neighbourhood search based on some insight from @vassvik and got significant speedups on large fluid simulations:) Here’s ~1/2 million particles ✨
Here’s the original thread on pre-loading the 3D tile into LDS: https://bsky.app/profile/vassvik.bsky.social/post/3lb6j2wnmtk2k
Morten Vassvik (@vassvik.bsky.social)

The core idea is to take the 10x10x10 footprint accessed by all threads in the workgroup and split it into chunks, and then fetch these chunks cooperatively in a way that efficiently aligns with subgroup boundaries in such a way that we don't introduce divergence, and then store to shared memory

Bluesky Social