Baseline system + leaderboards are up for #MerlionChallenge untangling complex code-mixed speech. Which #ML #DeepLearning #SpeechProc system will do the best job on complex language use in the wild? 👀

TWO TEAMS have already beaten the baseline for Language ID:
🎉Lingua_Lumos (Closed)
🎉UNSW_Signal_Processing (Open)

There’s still time to join the challenge and prep your paper for our special session at #Interspeech2023

https://toot.community/@suzyjstyles/109793982337809507

Dr Suzy J Styles (@[email protected])

Have you ever seen auto-generated subtitles turn to mush because they couldn’t handle a speaker’s accent or figure out what language they’re speaking after a switch? The #MerlionChallenge for #Interspeech23 tests how well teams can build a language detection system for Code-Switching in >300 Zoom recordings. Help build robust systems for multilingualism by joining the challenge or sharing with #ML #DeepLearning #SpeechProc friends 💪🏼💪🏽💪🏿 https://toot.community/@suzyjstyles/109713790862725145

toot.community

Have you ever seen auto-generated subtitles turn to mush because they couldn’t handle a speaker’s accent or figure out what language they’re speaking after a switch?

The #MerlionChallenge for #Interspeech23 tests how well teams can build a language detection system for Code-Switching in >300 Zoom recordings.

Help build robust systems for multilingualism by joining the challenge or sharing with #ML #DeepLearning #SpeechProc friends 💪🏼💪🏽💪🏿

https://toot.community/@suzyjstyles/109713790862725145

Dr Suzy J Styles (@[email protected])

Attached: 1 image I’m sure I have a bunch of #Multilingual #LangDev, #SpeechProc #NLP and #CogSci friends over here 🦣 We’ve prepped >30hrs of our English/Mandarin code-switched child directed speech for the #MerlionChallenge at this year’s INTERSPEECH >300 files, >100 voices 🙀 (+ training data) We’re looking for speech systems that can figure out which language is spoken when! The #MerlionChallenge will see whose system does the best job 💪🏼 Join or help us boost the message: https://sites.google.com/view/merlion-ccs-challenge/

toot.community

Ever seen auto-generated subtitles turn to mush because they couldn’t handle a speaker’s accent or figure out what language they’re speaking after a switch?

The #MerlionChallenge at #Interspeech23 tests how well teams can build a language detection system for real-world Code-Switching between English and Mandarin Chinese in >300 Zoom recordings.

Join the challenge or boost to help build more robust speech systems for multilingualism 💪🏼💪🏽💪🏿

https://toot.community/@suzyjstyles/109713790862725145

Dr Suzy J Styles (@[email protected])

Attached: 1 image I’m sure I have a bunch of #Multilingual #LangDev, #SpeechProc #NLP and #CogSci friends over here 🦣 We’ve prepped >30hrs of our English/Mandarin code-switched child directed speech for the #MerlionChallenge at this year’s INTERSPEECH >300 files, >100 voices 🙀 (+ training data) We’re looking for speech systems that can figure out which language is spoken when! The #MerlionChallenge will see whose system does the best job 💪🏼 Join or help us boost the message: https://sites.google.com/view/merlion-ccs-challenge/

toot.community

Bonus

✨✨DID YOU KNOW?✨✨With the body of a mermaid and the head of a lion, the Merlion is a national icon of Singapore.

✨Just as the Merlion is a mix of different creatures, the code-switched child-directed speech in the #MerlionChallenge is a mix of different languages✨

(apologies for cross posting)

Most AI speech processing systems are developed using samples of monolingual speech between adults. #WEIRDbias

We hope the #MerlionChallenge at @Interspeech 2023 pushes the frontiers of how automated systems handle the diverse kinds of #translanguaging we see in the world 🌍

If you want to see better tools for #LangDev #Multilingualism #DiverseVoices and #GlobalLanguages then help us ✨boost✨ these posts can reach all the lovely #SpeechProc and #CognitiveScience folks!

What makes this a good #AI challenge?
👉Natural code switching (no shuffled segments)
👉Accented English & Mandarin
👉Precision human annotation
👉Various far field mics (laptops/tablets)
👉Internet audio (Zoom)
👉Adults speaking to kids

#MerlionChallenge #Interspeech

Participating teams have a chance to submit their papers at our special session at #Interspeech 2023 in Dublin (yes, Ireland) ☘️

You can find out more about the #MerlionChallenge or sign up to take part by taking a look at our shiny new website!

https://sites.google.com/view/merlion-ccs-challenge/home

MERLIon CCS Interspeech 2023

About

For the #MerlionChallenge at #Interspeech we’ll be asking teams to train a #SpeechProc / #AI system that can guess which language is which (Task 1: Language ID) and when (Task 2: Language Diarization)!

👉Challenge audio is Zoom recordings with English and/or Mandarin Chinese
👉Audio for development matches audio for evaluation 😗👌

Our annotation protocol is documented in the BELA transcription conventions. The Wiki includes instructions for how to do multi-tier multilingual transcriptions using Elan (free!)

BELA Con:
blipntu.github.io/belacon/

For the #MerlionChallenge we hold some info back

All of the #MerlionChallenge audio recordings were collected via Zoom calls, where parents narrated a wordless picturebook to their children (link to an old thread on the bird site)

The book is free to download, and can be used for any language or combo!

https://twitter.com/suzyjstyles/status/1453324258654883844?s=21&t=UaXkchQhLvUn0AzL7Hu90g