I finally understand what it means to "discover" an algorithm and its beautiful.
Was working on re-implementing MPI Collectives (reduce/broadcast, scatter/gather) for class and I couldn't figure out why this was being so difficult. I had already done the broadcast algorithm and I knew that gather was similar, but every node ended up with a segment of the data, not the entire set. 🧵
