Cohere Transcribe: Speech Recognition

https://cohere.com/blog/transcribe

Cohere Transcribe: state-of-the-art speech recognition

Unmatched accuracy and speed. Transcribe converts your business’ audio data into precise text for search, analytics, and automation.

Cohere

> Limitations

>Timestamps/Speaker diarization. The model does not feature either of these.

What a shame. Is whisperx still the best choice if you want timestamps/diarization?

There is also: https://github.com/linto-ai/whisper-timestamped

It doesn't use an extra model (so it supports every language that works with Whisper out of the box and use less memory), it works by applying Dynamic Time Warping to cross-attention weights.

GitHub - linto-ai/whisper-timestamped: Multilingual Automatic Speech Recognition with word-level timestamps and confidence

Multilingual Automatic Speech Recognition with word-level timestamps and confidence - linto-ai/whisper-timestamped

GitHub
Just a warning that plain WhisperX is more accurate and Whisper-timestamped has many weird quirks.