Mastodawn

TGSpeechBox 3.0 is out. After seven betas, two release candidates, and more commits than I care to count — I'm calling it done.
I've been building this synthesizer for 5 months now, and 3.0 is the first release where I feel like it's genuinely a different piece of software from what came before. If you put 2.99 and 3.0 side by side, the difference isn't subtle. The vowels, the diphthongs, the stops, the prosody, the way connected speech actually flows, it's night and day. The Fujisaki pitch model not being a mechanical bull, the diphthong collapse system, the dictionary system fully done, the prominence and multiple-pitch pass pipeline. All of it came together in this cycle.
Every platform got real work. Linux is a first-class citizen now! A native Speech Dispatcher module, proper installer, PipeWire and ALSA auto-detection. No more pipes, no more shimmer. Android is on Google Play. iOS and macOS are on the App Store. Windows SAPI has a full settings UI. The pronunciation dictionary system ships on all of them.
None of this happened alone. This release belongs to the testers, the issue reporters, the dictionary contributors, and everyone who sent feedback across the betas. You shaped it.
3.0 is a milestone. I hope you hear it.
https://apps.apple.com/us/app/tgspeechbox/id6759512621
https://play.google.com/store/apps/details?id=com.tgspeechbox.tts
https://github.com/tgeczy/TGSpeechBox/releases/download/v-300/TGSpeechBox-300.nvda-addon
https://github.com/tgeczy/TGSpeechBox/releases/download/v-300/TGSBPhonemeEditor-v300.zip
https://github.com/tgeczy/TGSpeechBox/releases/download/v-300/TGSpeechSapiSetup-v300.exe
https://github.com/tgeczy/TGSpeechBox/releases/download/v-300/TGSpeechBox-v300.apk
https://github.com/tgeczy/TGSpeechBox/releases/download/v-300/tgspeechbox-linux-x86_64-v-300.tar.gz
https://github.com/tgeczy/TGSpeechBox/releases/download/v-300/tgspeechbox-linux-aarch64-v-300.tar.gz
https://play.google.com/apps/testing/com.tgspeechbox.tts
https://testflight.apple.com/join/Y8RBtGBY

TGSpeechBox App - App Store

Download TGSpeechBox by Tamas Geczy on the App Store. See screenshots, ratings and reviews, user tips, and more apps like TGSpeechBox.

App Store

Show thread

the esoteric programmer 2d ago

@Tamasg awesome! I bit the bullet as it were, and installed it, switching my speech to it. First impressions, this sounds a lot like an espeak variant, except the letters have a certain accent to them that espeak doesn't, it reminds me of a german synth trying to speak english, though it's better because it has english specific phonemes and stuff like that. Are there settings I can tweak to make it sound more like eloquence or dectalk for example?

Show thread

Tamas G 2d ago

@esoteric_programmer Thanks for trying it! The "accent" you're hearing is the default voicing tone. try switching pitch modes. In NVDA: voice settings panel, Pitch Mode. or as part of the synth ring, "Fujisaki" gives you widest intonation contours. "Impulse" is closer to DECTalk. You can also tweak the 9 voicing tone sliders — "Voiced Spectral Tilt" is the big one for character: negative values sound brighter/crisper (more DECTalk), positive sounds warmer. "Formant Sharpness" (cascade bandwidth) tightens the resonances for that classic hardware synth feel. And try the different voices like Robert is the brightest, David is the deepest. We're always tuning the default so feedback like this is exactly what helps! :D
Also a big thanks to you for some of the earlier consonant tuning work you helped with! Definitely not perfect, but 3.0 isn't the end of the chapter for it either, just a chance to call the mountain of bugs fixed and features added a great milestone to stand on, ha.

Show thread

the esoteric programmer 2d ago

@Tamasg aha, gotcha. Is this in a config file for linux maybe? but sure, I'll try with nvda, since that probably has better responsiveness to what I test because I wouldn't have to save a config file and reload speech dispatcher every time. Anyways, so I should try impulse and lower those sliders to negative values? I don't remember the consonant bug I helped fix, but if you convert the c++ core of the synth to rust, you'll see that all the bugs will magically melt away /s

Show thread

Tamas G 2d ago

@esoteric_programmer ah you're on Linux!
Well, the SD module was just built tonight. Ha. You're using the latest copy that doesn't have the weird shimmery speech thing on Linux. But.
Pitch mode n Linux. You can tweak it today, just not through a GUI yet. Your packs live at /usr/local/share/tgspeechbox/packs/lang/.
Open en-us.yaml (or whichever language), find legacyPitchMode: espeak_style and change it to fujisaki_style for the widest contours, impulse_style for DECTalk-like, or klatt_style for hat-pattern. Then killall speech-dispatcher and it picks up the change. The SD module reads straight from the pack files so nothing gets reset. It's not a settings panel but it works!
Hardcore YAML tweaking until we add proper config support post-3.0. we
want tgsb-native.conf to let you set pitch mode, voicing tone, and voice quality params without restarting, and maybe a dedicated settings tool on Linux too.
And re: Rust. I'll rewrite it in Rust right after I finish rewriting it in Zig, which is right after I finish the COBOL port. The C++ bugs keep me young and nimble! :D For now. Once Rust native bridging across all our 6 platforms gets a little easier, I will be thinking of it, but the ecosystem is still quickly maturing around it especially on the toolchain front.

Show thread

the esoteric programmer 2d ago

@Tamasg I mean, depends what you mean by rust native bridging, because we have a lot of crates for communicating with the operating system. For the parts where we can't, we can use cxx bridge to connect c++ libraries to rust, or uniffi if we want to generate bindings from rust to another language, for example swift for ios or kotlin for android. But yeah, hell yeah to the hardcore yaml editing! Also, kinda an interesting path to put your packs in, but yeah. Btw, what shimmer thing with speech dispatcher?

Show thread

Tamas G 2d ago

@esoteric_programmer OMG. I literally shipped the native Speech Dispatcher module tonight and within an hour I crashed it by switching to the Robert voice! an unsafe pointer arithmetic hack to poke a voicing tone struct field at a raw byte offset.
In Rust that's a compile error. In C++ that's a 2 AM "why is my screen reader silent" debugging session. Fixed now though, and
Robert sounds great! Although, new issue to debug for post 3.0, why the SPD connection get lost when Orca reloads after setting the voice. Oh gosh. Ah well. Voices now work for sure though, fresh tar on release page that caught it at least, early on after release. Now bed for real. :D Ahaha.
Rust could handle the DSP, but the frontend's 24-pass tokem pipeline where each pass mutates a shared vector in-place would fight the borrow checker constantly. And wrapping
every exported function in extern "C" fn + unsafe blocks for 6 platforms... we'd spend more time on FFI glue than on
speech quality! The C++ bugs keep me honest. The platform reach keeps us shipping. May your day be a great one.

Show thread

the esoteric programmer 2d ago

@Tamasg ha! see? speak of rust, and a c++ bug will appear to prove me right lol. About mutable vectors and stuff like that, there are patterns to deal with this depending on how it all works. About manually doing extern "c" stuff, you don't have to, there are crates like uniffi which do all that, if you annotated your structs and functions properly with their macro. But yeah, that said, good night!

Show thread

Tamas G

@esoteric_programmer Oooh yeah! Uniffi is cool, Mozilla built it for Firefox components and it does handle the JNI/Swift/Python bindings automatically. Honestly if we were starting from zero it'd be tempting. But we're 15K lines deep with 6 workin platform bridges and a Friday night native SD module that's still warm. Maybe for 4.0 I'll rewrite it in Rust and call it TGRustBox. Good night! I'll catch up on bugs tomorrow :D

Show thread

the esoteric programmer 2d ago

@Tamasg I mean, we need something like this in rust at some point, we don't have enough speech synthesizers, and a library like this would make it more possible to add it to firmware, if there would ever be some way to output sound and get text info from UEFI interfaces as a UEFI app. But yeah, rust box, for sure :p