Mastodawn

David Smith Jan 15, 2023

I'm working on reviving my old podcast searching system using OpenAI's Whisper engine (https://github.com/openai/whisper).

The results so far are amazing. I can run the transcription right on my Mac at roughly 5X realtime, and the accuracy is super impressive. It even gets brand names and weird words right nearly every time.

For example, this segment from The Talk Show where @marcoarment and @gruber argue about how to pronounce databases was perfectly transcribed, down the even the mispronunciations. 🤯

GitHub - openai/whisper: Robust Speech Recognition via Large-Scale Weak Supervision

Robust Speech Recognition via Large-Scale Weak Supervision - openai/whisper

GitHub

Show thread

Martin the paranoid android.Jan 15, 2023

@_Davidsmith Does it do well with crosstalk?

Show thread

David Smith

@DonSqueak It does alright, it isn’t trying to segment by speaker it will just transcribe each intelligible word it hears. So if there is cross talk it might intertwine the speakers but the words themselves are accurate

Show thread

Martin the paranoid android.Jan 15, 2023

@_Davidsmith Once they segment by speaker, that’s gonna be the transcription holy grail I guess(?)

Show thread

Harald Jan 20, 2023

@DonSqueak @_Davidsmith This might be relevant to your interests! Just found it the other day and learned about diarization: https://huggingface.co/spaces/vumichien/whisper-speaker-diarization

Whisper Speaker Diarization - a Hugging Face Space by vumichien

Discover amazing ML apps made by the community

Show thread

Martin the paranoid android.Jan 20, 2023

@haraball Thanks!!

Show thread

David Smith Jan 20, 2023

@haraball Thanks for this. I've seen a few diarization efforts underway, at this point I haven't pursued it but definitely something I'm keeping my eyes on.