Our pick of the week by @mgaido91: "Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition" by Rekesh et al., 2023.

https://arxiv.org/abs/2305.05084

#conformer #speech #speechrecognition #recognition #attention #fast

Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition

Conformer-based models have become the most dominant end-to-end architecture for speech processing tasks. In this work, we propose a carefully redesigned Conformer with a new down-sampling schema. The proposed model, named Fast Conformer, is 2.8x faster than original Conformer, while preserving state-of-the-art accuracy on Automatic Speech Recognition benchmarks. Also we replace the original Conformer global attention with limited context attention post-training to enable transcription of an hour-long audio. We further improve long-form speech transcription by adding a global token. Fast Conformer combined with a Transformer decoder also outperforms the original Conformer in accuracy and in speed for Speech Translation and Spoken Language Understanding.

arXiv.org
World's fastest talking man sings Michael Jackson's BAD in 20 seconds @VideoScrapbookOfOurTimes

YouTube