Cohere Transcribe: Speech Recognition

https://cohere.com/blog/transcribe

Cohere Transcribe: state-of-the-art speech recognition

Unmatched accuracy and speed. Transcribe converts your business’ audio data into precise text for search, analytics, and automation.

Cohere

My worry is that ASR will end up like OCR. If the multi modal large AI system is good enough (latency wise), the advantage of domain understanding eats the other technlogies alive.

In OCR, even when the characters are poorly scanned, the deep domain understanding these large multi modal AIs have allows it to understand what the document actually meant - this is going to be order id because in the million invoices I have seen before order id is normally below order date - etc. The same issue is going to be there in ASR also is my worry.

This is both good and bad. Good ASR can often understand low quality / garbled speech that I could not figure out, but it also "over corrects" sometimes and replaces correct but low prior words with incorrect but much more common ones.

With OCR the risk is you get another xerox[1] incident where all your data looks plausible but is incorrect. Hope you kept the originals!

(This is why for my personal doc scans, I use OCR only for full text search, but retain the original raw scans forever)

[1] https://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres...

Xerox scanners/photocopiers randomly alter numbers in scanned documents

Xerox scanners/photocopiers randomly alter numbers in scanned documents Please see the „condensed time line“ section (the next one) for a time line of how the Xerox saga unfolded. It for example depicts that I did not push the thing to the public right away, but gave Xerox a lot of time before I did so. <iframe width="700" height="394" src="https://www.youtube.com/embed/c0O6UXrOZJo" frameborder="0" allowfullscreen></iframe>

D. Kriesel