Mastodawn

https://winbuzzer.com/2025/12/14/google-tranlsate-unlocks-gemini-ai-live-speech-translations-for-all-android-users-xcxwbn/

Google Tranlsate Unlocks Gemini AI Live Speech Translations for All Android Users

#AI #Google #Android #GeminiAI #GoogleTranslate #LiveTranslate #GenAI #LanguageLearning #EdTech #SpeechTranslation #RealTimeTranslation #Alphabet #BigTech

Reddit Tech VN Bot Nov 19, 2025

Dự án dịch ngôn ngữ thời gian thực gặp khó khăn. Hệ thống hiện tại có độ trễ cao, không đạt được độ trễ 3 giây như các hệ thống thương mại. #DịchNgônNgữ #ThờiGianThực #SpeechTranslation #RealTimeTranslation #AI #TríTuệNhânTạo

https://www.reddit.com/r/LocalLLaMA/comments/1p12rwx/building_realtime_speech_translation_vadasrmttts/

MT Group at FBK May 19, 2023

Our pick of the week by @sarapapi: "Consistent Transcription and Translation of Speech" by Sperber et al., 2020 TACL.

https://arxiv.org/pdf/2007.12741.pdf

#NLProc #NLP #speech #translation #speechtranslation #consistency #consistent

Jonathan Downie Mar 24, 2023

#AI will only take over from human #interpreters when it stops doing what most human interpreters say they do.

#1nt #speechTranslation

MT Group at FBK Mar 4, 2023

Our pick of the week by @sarapapi: Siqi Ouyang et al., "WACO: Word-Aligned Contrastive Learning for Speech Translation"
https://arxiv.org/abs/2212.09359
#NLP #NLProc #SpeechTranslation #ST #MachineTranslation #MT #ContrastiveLearning #AI

WACO: Word-Aligned Contrastive Learning for Speech Translation

End-to-end Speech Translation (E2E ST) aims to directly translate source speech into target text. Existing ST methods perform poorly when only extremely small speech-text data are available for training. We observe that an ST model's performance closely correlates with its embedding similarity between speech and source transcript. In this paper, we propose Word-Aligned COntrastive learning (WACO), a simple and effective method for extremely low-resource speech-to-text translation. Our key idea is bridging word-level representations for both speech and text modalities via contrastive learning. We evaluate WACO and other methods on the MuST-C dataset, a widely used ST benchmark, and on a low-resource direction Maltese-English from IWSLT 2023. Our experiments demonstrate that WACO outperforms the best baseline by 9+ BLEU points with only 1-hour parallel ST data. Code is available at https://github.com/owaski/WACO.

arXiv.org

MT Group at FBK Feb 17, 2023

Our Pick of the week: Phuong-Hang Le et al., "Pre-training for Speech Translation: CTC Meets Optimal Transport"
by @mgaido91

https://arxiv.org/abs/2301.11716

#NLProc #optimaltransport #CTC #speechtranslation

Pre-training for Speech Translation: CTC Meets Optimal Transport

The gap between speech and text modalities is a major challenge in speech-to-text translation (ST). Different methods have been proposed to reduce this gap, but most of them require architectural changes in ST training. In this work, we propose to mitigate this issue at the pre-training stage, requiring no change in the ST model. First, we show that the connectionist temporal classification (CTC) loss can reduce the modality gap by design. We provide a quantitative comparison with the more common cross-entropy loss, showing that pre-training with CTC consistently achieves better final ST accuracy. Nevertheless, CTC is only a partial solution and thus, in our second contribution, we propose a novel pre-training method combining CTC and optimal transport to further reduce this gap. Our method pre-trains a Siamese-like model composed of two encoders, one for acoustic inputs and the other for textual inputs, such that they produce representations that are close to each other in the Wasserstein space. Extensive experiments on the standard CoVoST-2 and MuST-C datasets show that our pre-training method applied to the vanilla encoder-decoder Transformer achieves state-of-the-art performance under the no-external-data setting, and performs on par with recent strong multi-task learning systems trained with external data. Finally, our method can also be applied on top of these multi-task systems, leading to further improvements for these models.

arXiv.org