Our pick of the week by @mgaido91: "Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition" by Rekesh et al., 2023.
https://arxiv.org/abs/2305.05084
#conformer #speech #speechrecognition #recognition #attention #fast

Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition
Conformer-based models have become the most dominant end-to-end architecture
for speech processing tasks. In this work, we propose a carefully redesigned
Conformer with a new down-sampling schema. The proposed model, named Fast
Conformer, is 2.8x faster than original Conformer, while preserving
state-of-the-art accuracy on Automatic Speech Recognition benchmarks. Also we
replace the original Conformer global attention with limited context attention
post-training to enable transcription of an hour-long audio. We further improve
long-form speech transcription by adding a global token. Fast Conformer
combined with a Transformer decoder also outperforms the original Conformer in
accuracy and in speed for Speech Translation and Spoken Language Understanding.
arXiv.org
World's fastest talking man sings Michael Jackson's BAD in 20 seconds @VideoScrapbookOfOurTimes
YouTube