Mastodawn

Cohere Transcribe: Speech Recognition

Cohere Transcribe: state-of-the-art speech recognition

Unmatched accuracy and speed. Transcribe converts your business’ audio data into precise text for search, analytics, and automation.

Cohere

Show thread

gruez Mar 31

> Limitations

>Timestamps/Speaker diarization. The model does not feature either of these.

What a shame. Is whisperx still the best choice if you want timestamps/diarization?

Show thread

GaggiX

There is also: https://github.com/linto-ai/whisper-timestamped

It doesn't use an extra model (so it supports every language that works with Whisper out of the box and use less memory), it works by applying Dynamic Time Warping to cross-attention weights.

GitHub - linto-ai/whisper-timestamped: Multilingual Automatic Speech Recognition with word-level timestamps and confidence

Multilingual Automatic Speech Recognition with word-level timestamps and confidence - linto-ai/whisper-timestamped

GitHub

Show thread

oezi Mar 31

Just a warning that plain WhisperX is more accurate and Whisper-timestamped has many weird quirks.