Just ran Demucs completely locally on my system (RX 6700 XT / 16 GB RAM).
Demucs is an open source AI model for music source separation, developed by Meta. It can split a full song into individual stems like vocals, drums, bass, and other instruments, making it useful for remixing, transcription, and audio analysis.
Test track: Fear of the Dark by Iron Maiden
(
https://www.youtube.com/watch?v=bePCRKGUwAY)
Setup:
- Demucs installed via pip
- Model: htdemucs (default)
- Input converted to WAV using ffmpeg
- GPU acceleration via ROCm
Setting it up is tricky because Demucs is tightly pinned to older PyTorch versions, so you have to install dependencies manually and use "--no-deps" to avoid breaking your (ROCm-)PyTorch setup.
Result:
Very clean vocal separation in most parts. Some artifacts appear during very loud or distorted sections (e.g. emotional peaks or shouting).
Next steps / possibilities:
- Normalize and filter audio before separation
- Extract vocals for transcription or remixing
- Create karaoke / instrumental versions
- Combine with Whisper for lyrics
- Batch processing for datasets
- Model: htdemucs_ft (higher quality)
Video workflow:
- Recorded with OBS
- Edited in Kdenlive
- Transcoded with VAAPI (H.264)
No cloud, real hardware.
Everything runs on Linux, so anyone can set this up.
Works on CPU as well, but much slower.
#Demucs #AI #MachineLearning #AudioSeparation #MusicAI #OpenSource #Linux #ROCm #AMD #DeepLearning #AudioProcessing #Vocals #Karaoke #StemSeparation #SelfHosted #NoCloud #FOSS #Tech #LocalAI #MetaAI