TGSpeechBox 3.0 is out. After seven betas, two release candidates, and more commits than I care to count — I'm calling it done.
I've been building this synthesizer for 5 months now, and 3.0 is the first release where I feel like it's genuinely a different piece of software from what came before. If you put 2.99 and 3.0 side by side, the difference isn't subtle. The vowels, the diphthongs, the stops, the prosody, the way connected speech actually flows, it's night and day. The Fujisaki pitch model not being a mechanical bull, the diphthong collapse system, the dictionary system fully done, the prominence and multiple-pitch pass pipeline. All of it came together in this cycle.
Every platform got real work. Linux is a first-class citizen now! A native Speech Dispatcher module, proper installer, PipeWire and ALSA auto-detection. No more pipes, no more shimmer. Android is on Google Play. iOS and macOS are on the App Store. Windows SAPI has a full settings UI. The pronunciation dictionary system ships on all of them.
None of this happened alone. This release belongs to the testers, the issue reporters, the dictionary contributors, and everyone who sent feedback across the betas. You shaped it.
3.0 is a milestone. I hope you hear it.
https://apps.apple.com/us/app/tgspeechbox/id6759512621
https://play.google.com/store/apps/details?id=com.tgspeechbox.tts
https://github.com/tgeczy/TGSpeechBox/releases/download/v-300/TGSpeechBox-300.nvda-addon
https://github.com/tgeczy/TGSpeechBox/releases/download/v-300/TGSBPhonemeEditor-v300.zip
https://github.com/tgeczy/TGSpeechBox/releases/download/v-300/TGSpeechSapiSetup-v300.exe
https://github.com/tgeczy/TGSpeechBox/releases/download/v-300/TGSpeechBox-v300.apk
https://github.com/tgeczy/TGSpeechBox/releases/download/v-300/tgspeechbox-linux-x86_64-v-300.tar.gz
https://github.com/tgeczy/TGSpeechBox/releases/download/v-300/tgspeechbox-linux-aarch64-v-300.tar.gz
https://play.google.com/apps/testing/com.tgspeechbox.tts
https://testflight.apple.com/join/Y8RBtGBY
TGSpeechBox App - App Store

Download TGSpeechBox by Tamas Geczy on the App Store. See screenshots, ratings and reviews, user tips, and more apps like TGSpeechBox.

App Store
@Tamasg Congratulations. This is major.
@ppatel ha thanks! I did say a week for the RC and sort of feature freeze stage, and was able to keep by that promise!
Then I saw this 72-hour window where no new issues kept dripping in, which is generally a good sign people are happy with the place it is at. Once the big issue 72 Linux stuff sorted out and I realized I needed our own proper speech-dispatcher module, that was it.
I also now have local testbeds for ARMV64 and X86 Linux distros so doing any kind of debugging on all of them is easy! Huge win! Proper workbench for engineering. Android on Windows, Linux on Pi and 2 VMs, M4 Mac for XCode. I plan to really look into learning more test-driven development because I practice it a lot at work and there's a lot of language-specific edge cases testing would help me catch, so getting better and doing more TDD post-3.0 is on the books for sure.
@Tamasg I've gotten to a point where I do nothing but TDD. If you're starting off with it, I suggest you tell the AI to do red-green TDD. I've developed a skill that reenforces it. Just adding /tdd gets me a long way. It's not going to catch everything, especially things that rely on external testing. But adding my own refactoring and elegant code skills catches most bugs and makes my own job much easier when reading code.
@ppatel oooh I'm super glad you're a proponent of it! At work one of my Android colleagues was very into it and he really got me doing a lot of tests especially in the React component world, and Python tests for things like an API upload script that delivered payloads. Without it I may have uploaded buckets of issues wrong and ran needless CI, so for sure about it helping catch that early. When it's mission-critical, not sure, wouldn't engineer without it but with C++ and speech synthesis, it was all new to me. My goal is to have tests that thoroughly go through each pass and the pipelines with sample IPA, phrases, do a lot of good runs during CI builds just like I do at my job. Seeing actual tests passing or failing on passes and language sets would be so sweet, and I have the GitHub Workflow + actions knowledge partially there from work now.
@Tamasg I even have a CI pipeline for one of my projects that does automated integration testing. I managed to run out of 2k CI minutes when I implemented that. Lol. In retrospect, it was not a good idea to let the tests run on their own all the time.
@Tamasg If you want to go back to some of your code and go through designing tests, pick an area of the code you want to work on, tell the LLM to do a linear exploration while designing red-green tests for it. Do it for the code where you don't feel confident about it. You'll be amazed at what the AI gives you. It catches its own mistakes lot more frequently.
@ppatel Interesting! See I know Jest, some Espresso tests for Android. But C++? I guess there's one called Catch2. I'm also looking at G-Test, since I know it can do assertion-based tests for C++ too. Funny though about the 2000 minute thing, similarly I had a work test CI run for 200 minutes before realizing I should probably go stop it and see what's causing it to hang. AI (at least Claude here) has done really good though about monitoring workfflows in realtime so if something hangs I can nudge it along, even if it doesn't always know the proper signal to realize it's actually hanging. Ha.
@Tamasg I've used Catch2 for C++ before. It's quite good. I made a mistake when writing my workflow. I didn't catch the mistake because I was sleepy when I did it. Apparently, it ran the integration tests every time I committed. Basically, it was supposed to download a bunch of PDF files from a corpus to test against my library. It was about five to six minutes of run time. Tests were a success. But, when I was writing and commiting, I didn't check my emails.
@ppatel Ha! At least they passed, not failed! That's like the only optimism there.
I'll make the same mental note tonight! There's a 109,000-word stress dictionary and an eSpeak phonemizer. If I accidentally wired that into every-commit CI instead of tag-only, that's 4 hours of GitHub Actions burning through my free minutes. Catch2 for the fast unit tests on every push, heavy integration stuff on PR/tag only. Lesson learned vicariously through you! :D and some of my own work mistakes around that.
@Tamasg Happy to serve as a guinae pig. The full suite of tests don't need to be run by Github. I'm running all of my tests on my local machine as it is. I've learned to assign background agents to run particular tests if I think I'll need them. With a major push, the full suite runs In my case, Claude will do it automatically with a major push. With commits, it will run tests for the session. You can obviously change this behavior.

@Tamasg Oh and fuzz testing could be a good friend when hunting down weird input bugs. Here's a good but simple explanation if you don't know the technique.

https://about.gitlab.com/topics/devsecops/what-is-fuzz-testing/

What is fuzz testing?

Fuzz testing, also called fuzzing, is a way to find bugs other software testing methodologies can’t."

about.gitlab.com
@Tamasg I'm glad that you didn't end up abandoning the project. I've been amazed at what you've managed to do.
@Tamasg awesome! I bit the bullet as it were, and installed it, switching my speech to it. First impressions, this sounds a lot like an espeak variant, except the letters have a certain accent to them that espeak doesn't, it reminds me of a german synth trying to speak english, though it's better because it has english specific phonemes and stuff like that. Are there settings I can tweak to make it sound more like eloquence or dectalk for example?
@esoteric_programmer Thanks for trying it! The "accent" you're hearing is the default voicing tone. try switching pitch modes. In NVDA: voice settings panel, Pitch Mode. or as part of the synth ring, "Fujisaki" gives you widest intonation contours. "Impulse" is closer to DECTalk. You can also tweak the 9 voicing tone sliders — "Voiced Spectral Tilt" is the big one for character: negative values sound brighter/crisper (more DECTalk), positive sounds warmer. "Formant Sharpness" (cascade bandwidth) tightens the resonances for that classic hardware synth feel. And try the different voices like Robert is the brightest, David is the deepest. We're always tuning the default so feedback like this is exactly what helps! :D
Also a big thanks to you for some of the earlier consonant tuning work you helped with! Definitely not perfect, but 3.0 isn't the end of the chapter for it either, just a chance to call the mountain of bugs fixed and features added a great milestone to stand on, ha.
@Tamasg aha, gotcha. Is this in a config file for linux maybe? but sure, I'll try with nvda, since that probably has better responsiveness to what I test because I wouldn't have to save a config file and reload speech dispatcher every time. Anyways, so I should try impulse and lower those sliders to negative values? I don't remember the consonant bug I helped fix, but if you convert the c++ core of the synth to rust, you'll see that all the bugs will magically melt away /s
@esoteric_programmer ah you're on Linux!
Well, the SD module was just built tonight. Ha. You're using the latest copy that doesn't have the weird shimmery speech thing on Linux. But.
Pitch mode n Linux. You can tweak it today, just not through a GUI yet. Your packs live at /usr/local/share/tgspeechbox/packs/lang/.
Open en-us.yaml (or whichever language), find legacyPitchMode: espeak_style and change it to fujisaki_style for the widest contours, impulse_style for DECTalk-like, or klatt_style for hat-pattern. Then killall speech-dispatcher and it picks up the change. The SD module reads straight from the pack files so nothing gets reset. It's not a settings panel but it works!
Hardcore YAML tweaking until we add proper config support post-3.0. we
want tgsb-native.conf to let you set pitch mode, voicing tone, and voice quality params without restarting, and maybe a dedicated settings tool on Linux too.
And re: Rust. I'll rewrite it in Rust right after I finish rewriting it in Zig, which is right after I finish the COBOL port. The C++ bugs keep me young and nimble! :D For now. Once Rust native bridging across all our 6 platforms gets a little easier, I will be thinking of it, but the ecosystem is still quickly maturing around it especially on the toolchain front.
@Tamasg I mean, depends what you mean by rust native bridging, because we have a lot of crates for communicating with the operating system. For the parts where we can't, we can use cxx bridge to connect c++ libraries to rust, or uniffi if we want to generate bindings from rust to another language, for example swift for ios or kotlin for android. But yeah, hell yeah to the hardcore yaml editing! Also, kinda an interesting path to put your packs in, but yeah. Btw, what shimmer thing with speech dispatcher?
@esoteric_programmer OMG. I literally shipped the native Speech Dispatcher module tonight and within an hour I crashed it by switching to the Robert voice! an unsafe pointer arithmetic hack to poke a voicing tone struct field at a raw byte offset.
In Rust that's a compile error. In C++ that's a 2 AM "why is my screen reader silent" debugging session. Fixed now though, and
Robert sounds great! Although, new issue to debug for post 3.0, why the SPD connection get lost when Orca reloads after setting the voice. Oh gosh. Ah well. Voices now work for sure though, fresh tar on release page that caught it at least, early on after release. Now bed for real. :D Ahaha.
Rust could handle the DSP, but the frontend's 24-pass tokem pipeline where each pass mutates a shared vector in-place would fight the borrow checker constantly. And wrapping
every exported function in extern "C" fn + unsafe blocks for 6 platforms... we'd spend more time on FFI glue than on
speech quality! The C++ bugs keep me honest. The platform reach keeps us shipping. May your day be a great one.
@Tamasg ha! see? speak of rust, and a c++ bug will appear to prove me right lol. About mutable vectors and stuff like that, there are patterns to deal with this depending on how it all works. About manually doing extern "c" stuff, you don't have to, there are crates like uniffi which do all that, if you annotated your structs and functions properly with their macro. But yeah, that said, good night!
@esoteric_programmer Oooh yeah! Uniffi is cool, Mozilla built it for Firefox components and it does handle the JNI/Swift/Python bindings automatically. Honestly if we were starting from zero it'd be tempting. But we're 15K lines deep with 6 workin platform bridges and a Friday night native SD module that's still warm. Maybe for 4.0 I'll rewrite it in Rust and call it TGRustBox. Good night! I'll catch up on bugs tomorrow :D
@Tamasg I mean, we need something like this in rust at some point, we don't have enough speech synthesizers, and a library like this would make it more possible to add it to firmware, if there would ever be some way to output sound and get text info from UEFI interfaces as a UEFI app. But yeah, rust box, for sure :p

@Tamasg - Hi there. I can appreciate that you did an enormous amount of work on this, and it came up in my Mastodon feed, so I figured I would try it out.

What should someone's expectations be about this app? Is it still a frequency-modulated speech synthesizer at heart? Have previous screen readers been worse than this? How is this an improvement over what iOS offers natively? Why does this particular speech generator help you as a blind person?

Nothing about this sounds "natural" like a real human voice. All 7 stock voices sound like the synthetic speech generator from a Mac in the 1990s. And no matter which of the dozens of controls I adjusted in the current voice, it made no obvious difference to the sound when I tested playback in the app.

I'd like to believe I am doing something wrong.

For reference, I am a professional classical musician-slash-choral singer who is normally-sighted. I have perfect pitch, formal educational training as an audio engineer, and have studied multiple languages, so I'm familiar with some of the esoteric terms in this app.

Thanks for your insight.

Sincerely,
The Iron Giant at hacky-derm dot eye oh. The domain name is spelled like pachyderm, but starts with an "h."

@theirongiant ▎ Great questions! TGSpeechBox is a formant synthesizer. the same family as DECTalk, Eloquence, and the classic Mac
PlainTalk voices. You're right that it doesn't sound "natural" like neural TTS. That's by design though.
Many blind users who work with a screen reader 8-16 hours a day overwhelmingly prefer formant synths over natural-sounding voices. At rates of 300-400 words per minute (which is normal for experienced users), neural voices turn to mush while formant synths stay intelligible. The predictable, consistent acoustic structure is a feature, your brain parses it effortlessly after a few days, like reading a monospace font.
▎ What TGSpeechBox adds: a full Klatt-style formant engine with modern prosody (Fujisaki pitch model), per-phoneme voice quality control, and native integration on every platform. iOS already has great neural voices, so this is for the users who want speed, responsiveness, and that classic synth feel with better tuning than eSpeak.
▎ For the controls: try cranking Rate to 80+ and switching Pitch Mode, that's where the difference becomes obvious. At conversational speed, formant synths sound odd to unfamiliar ears. At working speed, they shine.
▎ It's a niche tool for a specific community, and that community has been underserved for years. Appreciate you trying it out with trained ears! (also, will update the description in the Playstore to better reflect that it's a formant synth!)
@Tamasg @javido en qué idioma habla esto?
@quetzatl @Tamasg habla en español, entre otros. La he probado en el móvil y me recuerda a la era del BH.
@Tamasg Does 3.0 include that updated en-us.yaml you talked with Simon about to fix some of the vowels? Or do I still nede to put that in?
@x0 oh yes! For sure. That got merged awhile back, lots more vowel and consonant tuning since then! :D