Mastodawn

#GRETSI2025, #EUSIPCO2025 sont terminés, la soumission à #ICASSP révolue, il fait gris. Abonner-vous à la chaîne YouTube #GRETSI @gretsi6095, et visionnez par exemple Lenka Zdeborova (Professeur EPFL), "Apprentissage (profond) et Physique Statistique"

https://youtu.be/yLFQUuVcC_0?si=DxdILCeA8Z3bnEEL

GRETSI 2022 - Apprentissage (profond) et Physique Statistique

YouTube

Dan Stowell Jan 13, 2025

We have 2 research papers accepted to present at #ICASSP in Hyderabad! You can read the preprints:
"LHGNN: Local-Higher Order Graph Neural Networks For Audio Classification and Tagging" led by Shubhr Singh https://arxiv.org/abs/2501.03464
"Acoustic identification of individual animals with hierarchical contrastive learning" led by Ines Nolasco https://arxiv.org/abs/2409.08673 #machinelearning #machinelistening #bioacoustics

LHGNN: Local-Higher Order Graph Neural Networks For Audio Classification and Tagging

Transformers have set new benchmarks in audio processing tasks, leveraging self-attention mechanisms to capture complex patterns and dependencies within audio data. However, their focus on pairwise interactions limits their ability to process the higher-order relations essential for identifying distinct audio objects. To address this limitation, this work introduces the Local- Higher Order Graph Neural Network (LHGNN), a graph based model that enhances feature understanding by integrating local neighbourhood information with higher-order data from Fuzzy C-Means clusters, thereby capturing a broader spectrum of audio relationships. Evaluation of the model on three publicly available audio datasets shows that it outperforms Transformer-based models across all benchmarks while operating with substantially fewer parameters. Moreover, LHGNN demonstrates a distinct advantage in scenarios lacking ImageNet pretraining, establishing its effectiveness and efficiency in environments where extensive pretraining data is unavailable.

arXiv.org

halcy

Apr 16, 2024

they can’t keep getting away with it #icassp #interspeech

halcy

Apr 16, 2024

you do in fact love to see it (and that this is now, in fact, the standard) #icassp #icassp2024

halcy

Apr 15, 2024

this #icassp industry talk could have been a Well There’s Your Problem podcast episode

halcy

Apr 11, 2024

anyone here going to #IEEE #ICASSP ?

MT Group at FBK Jun 27, 2023

Our pick of the week by Dennis Fucci: "Explanations for Automatic Speech Recognition" (Wu et al., 2023 #ICASSP).

https://ieeexplore.ieee.org/document/10094635

#NLProc #NLP #Speech #Recognition #ASR #SpeechRecognition #explanation #explainableAI #AI

Explanations for Automatic Speech Recognition

We address quality assessment for neural network based ASR by providing explanations that help increase our understanding of the system and ultimately help build trust in the system. Compared to simple classification labels, explaining transcriptions is more challenging as judging their correctness is not straightforward and transcriptions as a variable-length sequence is not handled by existing interpretable machine learning models.We provide an explanation for an ASR transcription as a subset of audio frames that is both a minimal and sufficient cause of the transcription. To do this, we adapt existing explainable AI (XAI) techniques from image classification - (1) Statistical Fault Localisation(SFL) [1] and (2) Causal [2]. Additionally, we use an adapted version of Local Interpretable Model-Agnostic Explanations (LIME) [3] for ASR as a baseline in our experiments. We evaluate the quality of the explanations generated by the proposed techniques over three different ASR – Google API [4], the baseline model of Sphinx [5], Deepspeech [6] – and 100 audio samples from the Commonvoice dataset [7].

Marco Gaido Feb 21, 2023

The paper of my internship at
#meta (https://arxiv.org/abs/2210.11981) was accepted to the 2023 IEEE ICASSP (@ieee_bot #icassp). See you there!

Named Entity Detection and Injection for Direct Speech Translation

In a sentence, certain words are critical for its semantic. Among them, named entities (NEs) are notoriously challenging for neural models. Despite their importance, their accurate handling has been neglected in speech-to-text (S2T) translation research, and recent work has shown that S2T models perform poorly for locations and notably person names, whose spelling is challenging unless known in advance. In this work, we explore how to leverage dictionaries of NEs known to likely appear in a given context to improve S2T model outputs. Our experiments show that we can reliably detect NEs likely present in an utterance starting from S2T encoder outputs. Indeed, we demonstrate that the current detection quality is sufficient to improve NE accuracy in the translation with a 31% reduction in person name errors.

arXiv.org

David Mortensen Feb 8, 2023

The decision to make #ACL Rolling Review (#ARR) and #ICASSP metareviews due within two days of each other did not take my needs into account.

Low Rank Jack Jan 13, 2023

Just received the 3 reviews of #ICASSP 23, not bad, but the least positive one, the 2nd reviewer, seems to have cut'n pasted his/her review from another clearly unrelated work 😶 let's see what to write in the rebuttal now, hoping it'll be considered 🤔