Mastodawn

David Smith Jan 15, 2023

I'm working on reviving my old podcast searching system using OpenAI's Whisper engine (https://github.com/openai/whisper).

The results so far are amazing. I can run the transcription right on my Mac at roughly 5X realtime, and the accuracy is super impressive. It even gets brand names and weird words right nearly every time.

For example, this segment from The Talk Show where @marcoarment and @gruber argue about how to pronounce databases was perfectly transcribed, down the even the mispronunciations. 🤯

GitHub - openai/whisper: Robust Speech Recognition via Large-Scale Weak Supervision

Robust Speech Recognition via Large-Scale Weak Supervision - openai/whisper

GitHub

Show thread

John Siracusa Jan 15, 2023

@_Davidsmith Can it do speaker identification?

Show thread

David Smith Jan 15, 2023

@siracusa not directly, there are other tools you can run that will segment by speaker, so if you wanted to I suppose you could combine them.

Show thread

John Siracusa Jan 15, 2023

@_Davidsmith Do you know of any that run on the Mac? I’d love transcripts and search for all my podcasts, but I think speaker identification is essential.

Show thread

Josh Cheshire

@siracusa @_Davidsmith If Merlin finds out you didn’t ask him about Descript…

Speaker recognition is one of its flagship features.