Mastodawn

Dylan Richard Muir Feb 27, 2024

Sound source localization is an important part of dealing with audio. People use it to help pay attention to someone talking to us in a noisy environment. Smart home speakers use it to identify when someone is speaking, to focus on their voice and reject the background noise.

Show thread

Dylan Richard Muir Feb 27, 2024

We've built a new system for sound source localization, based on spiking neural networks (SNNs), that sets a new state-of-the-art for SNN implementations, is extremely power efficient, and even matches the accuracy of standard DSP-based approaches! [1] https://arxiv.org/abs/2402.11748

Low-power SNN-based audio source localisation using a Hilbert Transform spike encoding scheme

Sound source localisation is used in many consumer devices, to isolate audio from individual speakers and reject noise. Localization is frequently accomplished by ``beamforming'', which combines phase-shifted audio streams to increase power from chosen source directions, under a known microphone array geometry. Dense band-pass filters are often needed to obtain narrowband signal components from wideband audio. These approaches achieve high accuracy, but narrowband beamforming is computationally demanding, and not ideal for low-power IoT devices. We demonstrate a novel method for sound source localisation on arbitrary microphone arrays, designed for efficient implementation in ultra-low-power spiking neural networks (SNNs). We use a Hilbert transform to avoid dense band-pass filters, and introduce a new event-based encoding method that captures the phase of the complex analytic signal. Our approach achieves state-of-the-art accuracy for SNN methods, comparable with traditional non-SNN super-resolution beamforming. We deploy our method to low-power SNN inference hardware, with much lower power consumption than super-resolution methods. We demonstrate that signal processing approaches co-designed with spiking neural network implementations can achieve much improved power efficiency. Our new Hilbert-transform-based method for beamforming can also improve the efficiency of traditional DSP-based signal processing.

arXiv.org

Show thread

Dylan Richard Muir Feb 27, 2024

Mammals use the fact that audio sources from different directions lead to very precise differences in arrival time between our two ears — known as inter-aural time differences (ITDs). ITDs are encoded by the differences in neuronal spike times produced by our cochleas. [2]

Show thread

Dylan Richard Muir Feb 27, 2024

Most SNN implementations of sound source localization take this approach, using the precise differences in spike times generated by a single-frequency sound at two microphones to estimate the location of an audio source.

Show thread

Dylan Richard Muir

We took a different approach, designed for arrays with many microphones (>2). We start with a construct called the Hilbert Transform to estimate the phase of each signals and to encode the phases as spikes. We then use a beamforming method to estimate the source direction.

Show thread

Dylan Richard Muir Feb 27, 2024

The major benefit of our approach is that it works for *any* signal, e.g. wideband speech, and not just narrowband sine waves.

Beamforming works by "steering" the microphone array towards a chosen direction, by combining the audio signal from each microphone. Usually this is done by assuming a particular frequency for the source signal (the "narrowband" regime).

Show thread

Dylan Richard Muir Feb 27, 2024

By using the Hilbert Transform we developed a single beamforming approach that works well in the narrowband case, and can use all frequencies of a wideband signal to work well in the wideband case!

Show thread

Dylan Richard Muir Feb 27, 2024

As a result, we use *much* less implementation resources for beamforming than standard approaches. Using an SNN means we are also very power efficient, while still achieving state-of-the-art accuracy for SSNs, comparable with standard super-resolution methods such as MUSIC [3]!

Show thread

Dylan Richard Muir Feb 27, 2024

If you're interested, you can read more details in our preprint on arXiv: https://arxiv.org/abs/2402.11748

And of course, our code is available open source: https://github.com/synsense/HaghighatshoarMuir2024

Low-power SNN-based audio source localisation using a Hilbert Transform spike encoding scheme

arXiv.org

Show thread

Dylan Richard Muir Feb 27, 2024

References
1. Haghighatshoar & Muir 2024. https://arxiv.org/abs/2402.11748

2. Jeffress 1948. https://doi.org/10.1037%2Fh0061495

3. Schmidt1986. https://doi.org/10.1109/TAP.1986.1143830

4. Pan et al. 2021. https://doi.org/10.1109/TASLP.2021.3100684

Low-power SNN-based audio source localisation using a Hilbert Transform spike encoding scheme

arXiv.org

Show thread

Dylan Richard Muir Feb 27, 2024

Photos
https://unsplash.com/photos/silhouette-photo-of-people-tysecUm5HJA

https://unsplash.com/photos/brown-and-white-cat-lying-on-brown-wooden-floor-Wqw32UCPYw8

https://unsplash.com/photos/a-blurry-photo-of-a-waterfall-with-water-IXTG1_Tw2dM

Photo by Papaioannou Kostas on Unsplash

People in silhouette – Download this photo by Papaioannou Kostas on Unsplash