Riffusion

Stable diffusion for real-time music generation

Riffusion

@rcarmo

„An audio spectrogram is a visual way to represent the frequency content of a sound clip. The x-axis represents time, and the y-axis represents frequency. The color of each pixel gives the amplitude of the audio at the frequency and time given by its row and column.“

@Gerhard_Schroeder yeah, I get how it works. What I’m amazed with is that the model can generate useful ones.
@rcarmo
If you are intrigued by this stuff you should check out a piece of (old skool) software called Melodyne made by a company called Celemony.
I saw a demo, must’ve been 15yrs ago(?), where they sampled Chet Baker’s typical trumpet (monophonic) *tone*(!) and then used it to play chords or other monophonic melodic lines, which sounded as if they had been played by him, because they had accurately sampled the instrument’s spectrum.
An “audio Lensa” if you wish.