Mastodawn

David Smith Jan 15, 2023

I'm working on reviving my old podcast searching system using OpenAI's Whisper engine (https://github.com/openai/whisper).

The results so far are amazing. I can run the transcription right on my Mac at roughly 5X realtime, and the accuracy is super impressive. It even gets brand names and weird words right nearly every time.

For example, this segment from The Talk Show where @marcoarment and @gruber argue about how to pronounce databases was perfectly transcribed, down the even the mispronunciations. 🤯

GitHub - openai/whisper: Robust Speech Recognition via Large-Scale Weak Supervision

Robust Speech Recognition via Large-Scale Weak Supervision - openai/whisper

GitHub

Show thread

Don Whiteside Jan 15, 2023

@_Davidsmith i must have a misconfiguration somewhere, I’m getting 1x on an M1 MBP at best. It’s still amazing in its quality but the speed is underwhelming.

Show thread

David Smith Jan 15, 2023

@donw Try this C++ port of Whisper. It is much faster I believe than the python based version. Then tweak the threads variable to best make use of your machine: https://github.com/ggerganov/whisper.cpp

GitHub - ggerganov/whisper.cpp: Port of OpenAI's Whisper model in C/C++

Port of OpenAI's Whisper model in C/C++. Contribute to ggerganov/whisper.cpp development by creating an account on GitHub.

GitHub

Show thread

bljubisic

@_Davidsmith @donw have anybody figured out how to transform the microphone PCM16 to whatever whisper is expecting? For me, it is working out of the box for wav files but microphone I just get mumbling as a result