Cohere Transcribe: Speech Recognition
Cohere Transcribe: Speech Recognition
> Limitations
>Timestamps/Speaker diarization. The model does not feature either of these.
What a shame. Is whisperx still the best choice if you want timestamps/diarization?
There is also: https://github.com/linto-ai/whisper-timestamped
It doesn't use an extra model (so it supports every language that works with Whisper out of the box and use less memory), it works by applying Dynamic Time Warping to cross-attention weights.