On day 3 of #LAC26, we saw a lot of clever approaches and genuine systems for #LivePerformance application and advanced #AudioProcessing. I presented Zero 2 Wi-Fi IEM: #Prototyping a #LowLatency #WiFi #InEarMonitoring System Using @RaspberryPi and #JackTrip.

Though it uses 2.4 GHz Wi-Fi—which is typically crowded by other networks and technologies such as #Bluetooth and #LoRa—I was surprised to experience only a few dropouts during the demo. I could almost walk out of the room without completely losing the connection. 😃

You know the agingtv GStreamer element, the one that makes the colours of a picture duller and adds noise so that it looks like it's coming from an old TV?

I wrote agingradio during the hackfest - instead of ruining your video, it ruins your audio! 🎉 Imagine listening to music on a very old radio, or calling a customer service hotline and suffering through the hold music whose quality is insufferable.

It supports five different types of distortion:

1) White noise (of configurable amplitude)
2) Clicks (of configurable probability)
3) Low-pass filter (of configurable frequency)
4) Quantisation noise (of a configurable amount of bits)
5) Cubic curve distortion / odd harmonics (of configurable amount and # of passes)

It is part of rsaudiofx and merged as of a couple of hours ago: https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/3087

I used the Chilli hold music from https://github.com/Hypfer/HotlineValetudo/tree/master/HotlineValetudo/Audio/holdmusic as a test case (Chilli.mp3 by @janhenrik , CC-BY-NC-SA). The result, using the element's default values, is attached to this post.

#GStreamer #RustLang #Hackfest #AudioProcessing

Stable Audio 3

Stable Audio 3 is a family of fast latent diffusion models (small, medium, large) for variable-length audio generation and editing. Since our models can generate several minutes of audio, variable-length generations are key to avoid the cost of producing full-length generations for short sounds. We also support inpainting, enabling targeted audio editing and the continuation of short recordings. Our latent diffusion models operate on top of a novel semantic-acoustic autoencoder that projects audio into a compact latent space, enabling efficient diffusion-based generation while preserving audio fidelity and encouraging semantic structure in the latent. Finally, we run adversarial post-training to both accelerate inference and improve generation quality, reducing the number of inference steps while improving fidelity and prompt adherence. Stable Audio 3 models are trained on licensed and Creative Commons data to generate music and sounds in less than a 2s on an H200 GPU and less than a few seconds on a MacBook Pro M4. We release the weights of small and medium, that can run on consumer-grade hardware, together with their training and inference pipeline.

arXiv.org

Not interested in reading? Just want to play with TransmuSynth? Check out the demo below instead:

https://transmusynth.fly.dev/

#Python #Cryptography #ImageProcessing #AudioProcessing #MIDI #Music #TransmuSynth

I wrote up a blog post that describes how I combined my image2sound and promp2pixel tools to create the web-based TransmuSynth!

https://johnbeers.xyz/behold-the-transmusynth.html

#Python #Cryptography #ImageProcessing #AudioProcessing #MIDI #Music #TransmuSynth

johnbeers.xyz - Behold, the TransmuSynth!

Python Templates for Neural Image Classification and Spectral Audio Processing – Part 2
https://www.youtube.com/watch?v=TNY2UGQ5kAc
#AudioProcessing #coding #programming #Python
Python Templates for Neural Image Classification and Spectral Audio Processing - Part 2

YouTube
Python Templates for Neural Image Classification and Spectral Audio Processing – Part 2
https://www.youtube.com/watch?v=TNY2UGQ5kAc
#AudioProcessing #coding #programming #Python
Python Templates for Neural Image Classification and Spectral Audio Processing - Part 2

YouTube

Evil Otto by Audio Damage 🎛️
OTT-style multiband comp: 3 bands, up/down comp, sidechain, A/B, visuals

💻 Win/Mac/Linux/iOS (CLAP/VST3/AAX/AU)
🎁 FREE
🔗 https://www.audiodamage.com/pages/evil-otto

#freeplugin #multibandcomp #audioprocessing #audiodamage #musicproduction #mixingmastering #vstplugin

Python Templates for Neural Image Classification and Spectral Audio Processing – Part 2
https://www.youtube.com/watch?v=TNY2UGQ5kAc
#AudioProcessing #coding #programming #Python
Python Templates for Neural Image Classification and Spectral Audio Processing - Part 2

YouTube
Just ran Demucs completely locally on my system (RX 6700 XT / 16 GB RAM).

Demucs is an open source AI model for music source separation, developed by Meta. It can split a full song into individual stems like vocals, drums, bass, and other instruments, making it useful for remixing, transcription, and audio analysis.

Test track: Fear of the Dark by Iron Maiden
(https://www.youtube.com/watch?v=bePCRKGUwAY)

Setup:

- Demucs installed via pip
- Model: htdemucs (default)
- Input converted to WAV using ffmpeg
- GPU acceleration via ROCm

Setting it up is tricky because Demucs is tightly pinned to older PyTorch versions, so you have to install dependencies manually and use "--no-deps" to avoid breaking your (ROCm-)PyTorch setup.

Result:
Very clean vocal separation in most parts. Some artifacts appear during very loud or distorted sections (e.g. emotional peaks or shouting).

Next steps / possibilities:

- Normalize and filter audio before separation
- Extract vocals for transcription or remixing
- Create karaoke / instrumental versions
- Combine with Whisper for lyrics
- Batch processing for datasets
- Model: htdemucs_ft (higher quality)

Video workflow:

- Recorded with OBS
- Edited in Kdenlive
- Transcoded with VAAPI (H.264)

No cloud, real hardware.
Everything runs on Linux, so anyone can set this up.
Works on CPU as well, but much slower.

#Demucs #AI #MachineLearning #AudioSeparation #MusicAI #OpenSource #Linux #ROCm #AMD #DeepLearning #AudioProcessing #Vocals #Karaoke #StemSeparation #SelfHosted #NoCloud #FOSS #Tech #LocalAI #MetaAI