Here's a more clearly visible demonstration of the problem I described previously: https://sigmoid.social/@chrisoffner3d/111591367887994819
On the left we see the progression of cross-attention maps extracted via the CPU, on the right we see the same cross-attention maps extracted via the GPU.
This is using the #Keras implementation of #StableDiffusion on an M3 Max.
#TensorFlow #StableDiffusion #Diffusion #Python #MLEngineering #MachineLearning #DeepLearning #GPU #M3Max
Chris Offner (@[email protected])
Attached: 1 image I'm running into some unexpected and significant non-determinism when running a #Keras diffusion model on my Apple GPU. On the left we see the progression of cross-attention maps for time steps from t = 0 to t = 900 when running the model via the CPU. We see that each cross-attention map undergoes some "refinement" progression as we go from t = 0 to t= 900. On the right we see the same but on the GPU. It's a much more erratic and discontinuous progression. #MLEngineering #DeepLearning #GPU