Mastodawn

Alexandre Mutel Jul 9, 2023

Hey, finally! I just published a new - and real - blog post "10x Performance with SIMD Vectorized Code in C#/.NET" https://xoofx.com/blog/2023/07/09/10x-performance-with-simd-in-csharp-dotnet/ 🎉

That was a quick write up, so my apologize for the poor phrasing, after 3 years without writing a blog post, I feel rusty. But it feels good to share again! 🤗 #dotnet

10x Performance with SIMD Vectorized Code in C#/.NET | xoofx

Show thread

nietras 👾Jul 9, 2023

@xoofx great post, this is basically Sep 👍is there a reason for doing permute *before* pack unsigned saturate?

Show thread

Alexandre Mutel Jul 10, 2023

@nietras oh, good catch! 🙂 No reasons, I think I missed the fact that I could use _mm256_permute4x64_epi64 after instead of performing the swap before. It helps saving 3 permutes in the end, not bad! Thanks for the suggestion.

I have updated the blog post and added a link to Sep at the end of the article: That's actually a good example of real world usage of intrinsics for performance benefits! 😉

Show thread

nietras 👾

@xoofx thanks ☺️

Show thread

nietras 👾Jul 11, 2023

@xoofx of course the generic versions can be improved too given there is ExtractMostSignificantBits (generic move mask) and not sure pack saturated is needed, can just Narrow. So should actually be able to make this fully generic. I think 😅

Show thread

Alexandre Mutel Jul 11, 2023

@nietras yeah, definitely could be, I took the original code without digging further, anyway, I'm back to my holidays, I won't check that until next week! 😎🏖️