Mastodawn

Looking for: TTS for fire dispatch

Looking for: TTS for fire dispatch - Lemmy.World

I’m a firefighter who’s also a software engineer and am working on a training app. In it, I have it generating text dispatches of various scenes for us to discuss on rainy slow training days. Such as “Respond to 123 Main st, for a report of a smell of smoke” etc. I already use google maps to generate a random address and show the map / street view. With the maps api I can domain lock the key… But with their Text to Speech api I cannot. Seems silly. But I get it. Are there any alternatives? I would be ok spinning up a middle server to also rework the audio to generate radio static, etc, but first pass I am looking for non-robotic (ie not browser based) TTS. Thoughts?

Show thread

mesa Aug 27, 2025

eSpeak is the old favorite. its very robotic, but with a lot of fiddling, you can get the option that will sound like a person.
piper https://noerguerra.com/how-to-read-text-aloud-with-piper-and-python/ is pretty good at mimicking human speach. Still sorta robotic, but can run on just about anything.
https://github.com/suno-ai/bark is hard to work with, but will definitly get you a human sounding voice if that is what you are looking for.

Hope that helps!

How to read text aloud with Piper and Python – Noé R. Guerra

Show thread

flandish Aug 27, 2025

sweet. thanks!

Show thread

flandish Aug 27, 2025

if i read this correctly a piper has webasm bindings too? I am running the training app in a browser as a vue3+ts stack. So I could spin up a worker on cloudflare, api key lock, and then use that or run it all browser side with wasm.

Time to tinker!

Show thread

mesa Aug 27, 2025

Give it a shot. Its been a full year since ive messed with this so let me know ho it goes. Might be some better stuff out now. But yeah piper/eSpeak both have fantastic performance. You can even put it on your phone should the need arise: https://f-droid.org/packages/org.woheller69.ttsengine/

SherpaTTS | F-Droid - Free and Open Source Android App Repository

Text-to-Speech engine based on Next-gen Kaldi

Show thread

flandish Aug 27, 2025

cool! i know cf workers ai can run a tts model and i can domain lock the api key… but this seems cooler. :)

Show thread

ruffsl Aug 27, 2025

I’ve really enjoyed using Kokoro for generating audiobooks:

Be sure to first try using this convenient API wrapper:

github.com/remsky/Kokoro-FastAPI

Note that not all the modelled voices in Kokoro-82M are of equal quality, given disparities in limited training data from reference speakers. However, what’s cool is that you can prescribe polynomial weights to multiple voices tags, enabling you to synthesize different variants weighted more heavily from the highest quality voices.

One current limitation for Kokoro is that there’s no way to prescribe emotion or intonation procedurally using markup tags like SSML in the source text, unlike other models like Orpheus. But Orpheus sometimes generate weird hallucinations like repeating sentences, injecting new phrases, appending radio silence or filter words, and generally increasing the tempo of words per minute as a sentence progresses. Still, this may be of interest if you want to add emotion like fear or urgency to your generated dispatches, and manage to tune the input temperature you want for the model.

However, Kokoro is a lot more compute efficient and audibly consistent, requiring less scrutiny or manual supervision. The author behind Kokoro now also looks to be working towards an emotional variant as well:

huggingface.co/posts/hexgrad/182254197987426

Reference project I’ve been following for audiobook generation:

github.com/prakharsr/audiobook-creator

GitHub - hexgrad/kokoro: https://hf.co/hexgrad/Kokoro-82M

https://hf.co/hexgrad/Kokoro-82M. Contribute to hexgrad/kokoro development by creating an account on GitHub.

GitHub

Show thread

flandish Aug 27, 2025

wow. thanks!