Today I tested MusicGen, a music generation AI developed by Meta.
The model can create short music clips directly from text prompts such as "heavy metal with distorted guitars" or "lofi hip hop with piano and rain". Everything in this demo runs completely locally on my Linux system using my RX 6700 XT.
Setting it up is tricky because MusicGen is tightly pinned to old PyTorch versions. You need to manually install most dependencies and then install MusicGen itself with "--no-deps" to avoid overwriting your (ROCm-)PyTorch setup.
MusicGen is part of Meta’s AudioCraft project and generates audio by predicting compressed audio tokens instead of raw waveforms. The generated tracks often sound quite generic, which is a known limitation of current AI music models. Still, it can be useful for quickly prototyping ideas, generating background music, creating sound textures, or experimenting with AI-generated audio.
The results can also be improved by generating continuations of the audio multiple times. By extending the output step by step it is possible to create longer pieces of music of two to three minutes that develop more structure than the short clips shown here.
Video workflow:
- Recorded with OBS
- Edited in Kdenlive
- Transcoded with VAAPI (H.264)
No cloud, real hardware.
Everything runs on Linux, so anyone can set this up.
No GPU? No problem, you can also run it using PyTorch’s CPU backend, just much slower.
#AI #MachineLearning #MusicGen #GenerativeAI #AIaudio #OpenSource #Linux #LocalAI #Fediverse #Tech