#AI #SpeechDetection

#OpenAI has just released #Whisper https://github.com/openai/whisper, a new open-source model for speech detection.

While after a couple of tries I'm impressed by its accuracy (you need to use the small model or a larger one if you want enough precision though), I'm also still unimpressed by its resource usage and performance.

The small model took ~30 seconds to process an audio file with 2 seconds of speech on my 6-year-old laptop with an i7 CPU, and in the meantime it used up more than 4 GB of RAM.

Mozilla's #DeepSpeech model was also heavy when I last used it ~1 year ago, but not THIS slow (although it was also slightly less accurate).

For now I definitely see the use-case for using OpenAI's new model for offline transcriptions, but they are still very far from being used for real-time applications such as voice assistants.

I'm still looking for a good open-source model that can be run on a RaspberryPi as a stable voice assistant. Ideally, it needs a small and simple model that can be used for hotword detection (I used to use Snowboy, but that project is now dead), and a more complex model to be used once the hotword is detected in order to transcribe the speech. And the audio transcription needs to be done within max 5 seconds in order to be compatible with the real-time expectations from a voice assistant.

Ideally, it needs to only include the model, not a lot of bloat around it that makes it harder to embed it - so #Mycroft is excluded.

So far, I haven't found any such model. My RPi still run the Google Assistant's push-to-talk script that I adapted into Platypush years ago, and a Snowboy hotword detection model that I managed to train before the project was shut down. If anybody knows of better solutions that could cut this last dependency on Google, I'd be happy to try them out.

GitHub - openai/whisper: Robust Speech Recognition via Large-Scale Weak Supervision

Robust Speech Recognition via Large-Scale Weak Supervision - openai/whisper

GitHub
Wearable Smart Camera Can Detect Silent Voice Commands

For sending voice commands in noisy environments or where you're forced to be quiet.

PetaPixel