Mastodawn

Want to learn how to analyze the inner workings of speech processing models? 🔍

Check out the programme for our tutorial, taking place at this year's Interspeech conference in Rotterdam: https://interpretingdl.github.io/speech-interpretability-tutorial/

The schedule features presentations and interactive sessions with a great team of co-organizers: Charlotte Pouw, Gaofei Shen, Martijn Bentum, Tom Lentz, @hmohebbi, @wzuidema, @gchrupala (and me!). We look forward to seeing you there 😃

#SpeechTech #SpeechScience #Interspeech2025

Show thread

Marianne de Heer Kloots Jul 9, 2024

I'm presenting this work this afternoon at the hils2024.nl conference (poster D29), and later this summer at Interspeech! 🌞
Come say hi if you like speech perception and interpretability research :)

Show thread

Marianne de Heer Kloots Jul 9, 2024

📏 We also compare three analysis methods for decoding phoneme preference from model internals, and find interesting differences between them!

➡️ Read more in the paper: https://arxiv.org/abs/2407.03005

Human-like Linguistic Biases in Neural Speech Models: Phonetic Categorization and Phonotactic Constraints in Wav2Vec2.0

What do deep neural speech models know about phonology? Existing work has examined the encoding of individual linguistic units such as phonemes in these models. Here we investigate interactions between units. Inspired by classic experiments on human speech perception, we study how Wav2Vec2 resolves phonotactic constraints. We synthesize sounds on an acoustic continuum between /l/ and /r/ and embed them in controlled contexts where only /l/, only /r/, or neither occur in English. Like humans, Wav2Vec2 models show a bias towards the phonotactically admissable category in processing such ambiguous sounds. Using simple measures to analyze model internals on the level of individual stimuli, we find that this bias emerges in early layers of the model's Transformer module. This effect is amplified by ASR finetuning but also present in fully self-supervised models. Our approach demonstrates how controlled stimulus designs can help localize specific linguistic knowledge in neural speech models.

arXiv.org

Show thread

Marianne de Heer Kloots Jul 9, 2024

💡 We find similar adaptation to phonotactic context in Wav2Vec2 models, emerging around the 4th layer of their Transformer module. This effect is amplified by finetuning for text transcription, but also present in fully self-supervised models (when trained on English speech).

Show thread

Marianne de Heer Kloots Jul 9, 2024

👉 We set out to test Wav2Vec2 models using a paradigm that very closely follows such experiments on human speech perception: we synthesize sounds on an acoustic continuum between /l/ and /r/ and embed them in contexts where only /l/, only /r/, or neither occur in English.

Show thread

Marianne de Heer Kloots Jul 9, 2024

One case of such contextual biasing effects comes from phonotactic constraints.

For example in English: TL << TR, SL >> SR

This has been demonstrated in human listeners a while ago! (https://doi.org/10.3758/BF03203046)

Show thread

Marianne de Heer Kloots Jul 9, 2024

Human phonetic categorization is linguistically informed: when hearing acoustically ambiguous speech sounds, we tend to perceive what we have learned to be more likely given the surrounding context. 💭

Marianne de Heer Kloots Jul 9, 2024

✨ Do current neural speech models show human-like linguistic biases in speech perception?

We took inspiration from classic phonetic categorization experiments to explore whether & where sensitivity to phonotactic context emerges in Wav2Vec2 models 🔍
(w/ @wzuidema )

📑
https://arxiv.org/abs/2407.03005

⬇️

Human-like Linguistic Biases in Neural Speech Models: Phonetic Categorization and Phonotactic Constraints in Wav2Vec2.0

What do deep neural speech models know about phonology? Existing work has examined the encoding of individual linguistic units such as phonemes in these models. Here we investigate interactions between units. Inspired by classic experiments on human speech perception, we study how Wav2Vec2 resolves phonotactic constraints. We synthesize sounds on an acoustic continuum between /l/ and /r/ and embed them in controlled contexts where only /l/, only /r/, or neither occur in English. Like humans, Wav2Vec2 models show a bias towards the phonotactically admissable category in processing such ambiguous sounds. Using simple measures to analyze model internals on the level of individual stimuli, we find that this bias emerges in early layers of the model's Transformer module. This effect is amplified by ASR finetuning but also present in fully self-supervised models. Our approach demonstrates how controlled stimulus designs can help localize specific linguistic knowledge in neural speech models.

arXiv.org

Marianne de Heer Kloots Jun 14, 2024

Feeling very inspired about ✨Using ANNs for Studying Human Language Learning and Processing (https://ANN-HumLang.github.io/)✨ after the workshop that Tamar Johnson and I organized this week at the ILLC in Amsterdam — many thanks to all our speakers and participants for such a great event, and to the Language in Interaction consortium for making it possible!

Marianne de Heer Kloots May 10, 2024

Abstract submissions for the Young Female* Researchers in Speech Workshop (YFRSW) are open until this Saturday! Accepted students will receive a grant to pay for travel and participation. Much looking forward to this great event alongside Interspeech in Greece this summer! 💬🔊🇬🇷
https://sites.google.com/view/yfrsw-2024/abstract-submission

YFRSW-2024 - Abstract submission

Call for Abstracts The Young Female* Researchers in Speech Workshop (YFRSW) is a workshop for female* Bachelor’s and Master’s students currently working in speech science and technology. The workshop aims to promote interest in research in our field among women* who have not yet committed to

Github	https://github.com/mdhk
Twitter	https://twitter.com/mariannedhk
Website	https://mdhk.net