#GRETSI2025, #EUSIPCO2025 sont terminés, la soumission à #ICASSP révolue, il fait gris. Abonner-vous à la chaîne YouTube #GRETSI @gretsi6095, et visionnez par exemple Lenka Zdeborova (Professeur EPFL), "Apprentissage (profond) et Physique Statistique"

https://youtu.be/yLFQUuVcC_0?si=DxdILCeA8Z3bnEEL

GRETSI 2022 - Apprentissage (profond) et Physique Statistique

YouTube
We have 2 research papers accepted to present at #ICASSP in Hyderabad! You can read the preprints:
"LHGNN: Local-Higher Order Graph Neural Networks For Audio Classification and Tagging" led by Shubhr Singh https://arxiv.org/abs/2501.03464
"Acoustic identification of individual animals with hierarchical contrastive learning" led by Ines Nolasco https://arxiv.org/abs/2409.08673 #machinelearning #machinelistening #bioacoustics
LHGNN: Local-Higher Order Graph Neural Networks For Audio Classification and Tagging

Transformers have set new benchmarks in audio processing tasks, leveraging self-attention mechanisms to capture complex patterns and dependencies within audio data. However, their focus on pairwise interactions limits their ability to process the higher-order relations essential for identifying distinct audio objects. To address this limitation, this work introduces the Local- Higher Order Graph Neural Network (LHGNN), a graph based model that enhances feature understanding by integrating local neighbourhood information with higher-order data from Fuzzy C-Means clusters, thereby capturing a broader spectrum of audio relationships. Evaluation of the model on three publicly available audio datasets shows that it outperforms Transformer-based models across all benchmarks while operating with substantially fewer parameters. Moreover, LHGNN demonstrates a distinct advantage in scenarios lacking ImageNet pretraining, establishing its effectiveness and efficiency in environments where extensive pretraining data is unavailable.

arXiv.org
they can’t keep getting away with it #icassp #interspeech
you do in fact love to see it (and that this is now, in fact, the standard) #icassp #icassp2024
this #icassp industry talk could have been a Well There’s Your Problem podcast episode
anyone here going to #IEEE #ICASSP ?

Our pick of the week by Dennis Fucci: "Explanations for Automatic Speech Recognition" (Wu et al., 2023 #ICASSP).

https://ieeexplore.ieee.org/document/10094635

#NLProc #NLP #Speech #Recognition #ASR #SpeechRecognition #explanation #explainableAI #AI

Explanations for Automatic Speech Recognition

We address quality assessment for neural network based ASR by providing explanations that help increase our understanding of the system and ultimately help build trust in the system. Compared to simple classification labels, explaining transcriptions is more challenging as judging their correctness is not straightforward and transcriptions as a variable-length sequence is not handled by existing interpretable machine learning models.We provide an explanation for an ASR transcription as a subset of audio frames that is both a minimal and sufficient cause of the transcription. To do this, we adapt existing explainable AI (XAI) techniques from image classification - (1) Statistical Fault Localisation(SFL) [1] and (2) Causal [2]. Additionally, we use an adapted version of Local Interpretable Model-Agnostic Explanations (LIME) [3] for ASR as a baseline in our experiments. We evaluate the quality of the explanations generated by the proposed techniques over three different ASR – Google API [4], the baseline model of Sphinx [5], Deepspeech [6] – and 100 audio samples from the Commonvoice dataset [7].

The paper of my internship at
#meta (https://arxiv.org/abs/2210.11981) was accepted to the 2023 IEEE ICASSP (@ieee_bot #icassp). See you there!
Named Entity Detection and Injection for Direct Speech Translation

In a sentence, certain words are critical for its semantic. Among them, named entities (NEs) are notoriously challenging for neural models. Despite their importance, their accurate handling has been neglected in speech-to-text (S2T) translation research, and recent work has shown that S2T models perform poorly for locations and notably person names, whose spelling is challenging unless known in advance. In this work, we explore how to leverage dictionaries of NEs known to likely appear in a given context to improve S2T model outputs. Our experiments show that we can reliably detect NEs likely present in an utterance starting from S2T encoder outputs. Indeed, we demonstrate that the current detection quality is sufficient to improve NE accuracy in the translation with a 31% reduction in person name errors.

arXiv.org
The decision to make #ACL Rolling Review (#ARR) and #ICASSP metareviews due within two days of each other did not take my needs into account.
Just received the 3 reviews of #ICASSP 23, not bad, but the least positive one, the 2nd reviewer, seems to have cut'n pasted his/her review from another clearly unrelated work 😶 let's see what to write in the rebuttal now, hoping it'll be considered 🤔