I'm working on reviving my old podcast searching system using OpenAI's Whisper engine (https://github.com/openai/whisper).

The results so far are amazing. I can run the transcription right on my Mac at roughly 5X realtime, and the accuracy is super impressive. It even gets brand names and weird words right nearly every time.

For example, this segment from The Talk Show where @marcoarment and @gruber argue about how to pronounce databases was perfectly transcribed, down the even the mispronunciations. 🤯

GitHub - openai/whisper: Robust Speech Recognition via Large-Scale Weak Supervision

Robust Speech Recognition via Large-Scale Weak Supervision - openai/whisper

GitHub
@_Davidsmith Thanks for doing this. There was a particular Roderick on the Line segment that I swear existed, but could never locate again, even with the old search. Sounds like this will help me prove I’m not crazy.

@Chris There are some older episodes of Roderick here: http://podsearch.david-smith.org/shows/7 So it might be there already....but that only goes through episode 285

But I had to stop updating it a while back when my old transcribing system broke.

Podcast Search