Mastodawn

TGSpeechBox 3.0 is out. After seven betas, two release candidates, and more commits than I care to count — I'm calling it done.
I've been building this synthesizer for 5 months now, and 3.0 is the first release where I feel like it's genuinely a different piece of software from what came before. If you put 2.99 and 3.0 side by side, the difference isn't subtle. The vowels, the diphthongs, the stops, the prosody, the way connected speech actually flows, it's night and day. The Fujisaki pitch model not being a mechanical bull, the diphthong collapse system, the dictionary system fully done, the prominence and multiple-pitch pass pipeline. All of it came together in this cycle.
Every platform got real work. Linux is a first-class citizen now! A native Speech Dispatcher module, proper installer, PipeWire and ALSA auto-detection. No more pipes, no more shimmer. Android is on Google Play. iOS and macOS are on the App Store. Windows SAPI has a full settings UI. The pronunciation dictionary system ships on all of them.
None of this happened alone. This release belongs to the testers, the issue reporters, the dictionary contributors, and everyone who sent feedback across the betas. You shaped it.
3.0 is a milestone. I hope you hear it.
https://apps.apple.com/us/app/tgspeechbox/id6759512621
https://play.google.com/store/apps/details?id=com.tgspeechbox.tts
https://github.com/tgeczy/TGSpeechBox/releases/download/v-300/TGSpeechBox-300.nvda-addon
https://github.com/tgeczy/TGSpeechBox/releases/download/v-300/TGSBPhonemeEditor-v300.zip
https://github.com/tgeczy/TGSpeechBox/releases/download/v-300/TGSpeechSapiSetup-v300.exe
https://github.com/tgeczy/TGSpeechBox/releases/download/v-300/TGSpeechBox-v300.apk
https://github.com/tgeczy/TGSpeechBox/releases/download/v-300/tgspeechbox-linux-x86_64-v-300.tar.gz
https://github.com/tgeczy/TGSpeechBox/releases/download/v-300/tgspeechbox-linux-aarch64-v-300.tar.gz
https://play.google.com/apps/testing/com.tgspeechbox.tts
https://testflight.apple.com/join/Y8RBtGBY

TGSpeechBox App - App Store

Download TGSpeechBox by Tamas Geczy on the App Store. See screenshots, ratings and reviews, user tips, and more apps like TGSpeechBox.

App Store

Show thread

theirongiant

@Tamasg - Hi there. I can appreciate that you did an enormous amount of work on this, and it came up in my Mastodon feed, so I figured I would try it out.

What should someone's expectations be about this app? Is it still a frequency-modulated speech synthesizer at heart? Have previous screen readers been worse than this? How is this an improvement over what iOS offers natively? Why does this particular speech generator help you as a blind person?

Nothing about this sounds "natural" like a real human voice. All 7 stock voices sound like the synthetic speech generator from a Mac in the 1990s. And no matter which of the dozens of controls I adjusted in the current voice, it made no obvious difference to the sound when I tested playback in the app.

I'd like to believe I am doing something wrong.

For reference, I am a professional classical musician-slash-choral singer who is normally-sighted. I have perfect pitch, formal educational training as an audio engineer, and have studied multiple languages, so I'm familiar with some of the esoteric terms in this app.

Thanks for your insight.

Sincerely,
The Iron Giant at hacky-derm dot eye oh. The domain name is spelled like pachyderm, but starts with an "h."

Show thread

Tamas G 2d ago

@theirongiant ▎ Great questions! TGSpeechBox is a formant synthesizer. the same family as DECTalk, Eloquence, and the classic Mac
PlainTalk voices. You're right that it doesn't sound "natural" like neural TTS. That's by design though.
Many blind users who work with a screen reader 8-16 hours a day overwhelmingly prefer formant synths over natural-sounding voices. At rates of 300-400 words per minute (which is normal for experienced users), neural voices turn to mush while formant synths stay intelligible. The predictable, consistent acoustic structure is a feature, your brain parses it effortlessly after a few days, like reading a monospace font.
▎ What TGSpeechBox adds: a full Klatt-style formant engine with modern prosody (Fujisaki pitch model), per-phoneme voice quality control, and native integration on every platform. iOS already has great neural voices, so this is for the users who want speed, responsiveness, and that classic synth feel with better tuning than eSpeak.
▎ For the controls: try cranking Rate to 80+ and switching Pitch Mode, that's where the difference becomes obvious. At conversational speed, formant synths sound odd to unfamiliar ears. At working speed, they shine.
▎ It's a niche tool for a specific community, and that community has been underserved for years. Appreciate you trying it out with trained ears! (also, will update the description in the Playstore to better reflect that it's a formant synth!)