I finally understand what it means to "discover" an algorithm and its beautiful.

Was working on re-implementing MPI Collectives (reduce/broadcast, scatter/gather) for class and I couldn't figure out why this was being so difficult. I had already done the broadcast algorithm and I knew that gather was similar, but every node ended up with a segment of the data, not the entire set. 🧵

Long story short, since every node ended up with identical sets, the sending order didn't matter, but since, when you scatter, each node had a specific segment its needs, sending order really matters. I tried going nearest neighbor first (like in broadcast) but that led to a sliding window and a LOT of bookkeeping. So instead of doing that, I said, "What happens if we inverse the order of sends?"

So instead of `{(0->1),(0->2, 1->3)}`, we do `{(0->2), (0->1, 2->3)}`. This meant that we could always send the second half of our buffer, simply shrink the buffer by half at the end of each round, and will reliably be left with our segment of data at the end of it all with ZERO BOOKKEEPING. The elegance fell in my lap. Stuff like this makes me love programming again. Maybe I should do a full write-up to better explain this...

#OMSCS #HPC