Just ran Whisper (OpenAI) completely locally on my system (RX 6700 XT / 16 GB RAM).
Whisper is an open source speech recognition model that can transcribe audio, generate subtitles, and even translate between languages.
Test video: The Reason Why Cancer is so Hard to Beat by Kurzgesagt - In a Nutshell
(
https://www.youtube.com/watch?v=uoJwt9l-XhQ)
Setup:
- Whisper installed via pip
- Model: small (fast, good enough for English)
- GPU acceleration via ROCm
Result:
~98% accurate transcription with only a few minor errors, already solid for generating subtitles.
Next steps / possibilities:
- Auto-generate subtitles (.srt)
- Correct subtitles with a local LLM
- Translate speech
- Burn subtitles directly into videos
Video workflow:
- Recorded with OBS
- Edited in Kdenlive
- Transcoded with VAAPI (H.264)
No cloud, real hardware.
Everything runs on Linux, so anyone can set this up.
No GPU? No problem, you can also run it using PyTorch’s CPU backend, just much slower.
Background music: End of Me - Ashes Remain [Female Rock Cover by Kryx] (
https://www.youtube.com/watch?v=E430M8lKim8)
#Whisper #OpenAI #ROCm #AMD #Linux #SpeechToText #Transcription #Subtitles #FOSS #OpenSource #OfflineAI #localai #Fediverse #nocloud