TGSpeechBox v3.0-beta3 is out! 19 bug fixes, 8 new features, 5 language pack improvements.
The big ones: stop consonants now use research-based burst spectral templates from Stevens & Blumstein — alveolar, velar, and labial stops each get their own shape, so /d/ vs /g/ and /t/ vs /k/ are clearly distinct. Stop clusters in words like "locked" and "kept" now properly unrelease the first stop, the way natural speech does.
MOUTH diphthong onset was only 30 Hz from schwa — "outside" sounded like "ertside." Fixed with Hillenbrand GenAm data. Per-diphthong duration scaling replaces the old global knob, so PRICE gets the time it needs without bloating GOAT. Diphthong rate compensation keeps bare "I" and "Y" from losing identity at high speech rates.
New Fujisaki clause-type overrides let language pack authors tune question/exclamation intonation in YAML. Spanish gets proper Castilian vs Latin American approximant splits. Australian English recovers its hand-tuned vowels.
Clause-final sonorants no longer clip. Cascade resonator pops, gone. Tap timing, fixed three ways.
And yes — we know en-gb PRICE still sounds a bit Stewie Griffin. The glide doesn't curve down the way it should yet. We hear you, it's on the workbench.
https://github.com/tgeczy/TGSpeechBox/releases/download/v-300b3/TGSpeechBox-v300b3.nvda-addon
https://github.com/tgeczy/TGSpeechBox/releases/download/v-300b3/TGSBPhonemeEditor-v300b3.zip
https://github.com/tgeczy/TGSpeechBox/releases/download/v-300b3/TGSpeechSapiSetup-v300b3.exe
https://github.com/tgeczy/TGSpeechBox/releases/download/v-300b3/TGSpeechBox-v300b4.apk
https://github.com/tgeczy/TGSpeechBox/releases/download/v-300b3/tgspeechbox-linux-aarch64-v-300b3.tar.gz
https://github.com/tgeczy/TGSpeechBox/releases/download/v-300b3/tgspeechbox-linux-x86_64-v-300b3.tar.gz
https://testflight.apple.com/join/jvvGY6Fz
@Tamasg Rate 60, pitch mode impulse, US English, read the words found out, or thousand.
@tspivey yeah, that's the glide retuning work. I feel like it's almost there, but a bit too flat. Hmm. But then I can't quite describe it, because in words like about, it's sounding better than, sound. So for sure needs more work. You have a sharp ear as well for noticing, I appreciate that. Ultimately I want it to have that same "aow" glide that Eloquence has for it.
@Tamasg @tspivey This segment:
"work. I feel like it's almost there, "
The I sounds like it's trying to say Uh I with an extremely short occurrence of the uh sound.
@jackf723 @tspivey yeah, also sounds like some sort of Diphthong collapse thing! Also at faster rates I'm noticing it happens more. So both of these feel like a diphthong type problem, if you listen to that phrase around rate 60-65 that thickening of the vowel doesn't happen. Almost like durations aren't scaling well enough with rates.
@jackf723 @tspivey you are both, absolutely correct. Check it out. new update in both the add-on panel or at https://github.com/tgeczy/TGSpeechBox/releases/download/v-300b3/TGSpeechBox-v300b3.nvda-addon (this is 958 KB, I compressed it with better 7-zip flags), so yeah, try it out. It should be noticeably better at words like "found" and "sound" and not thicken up the speech on the "I" vowel at fast rates, skipping the glide harshly. Y'all made it happen.
@Tamasg @jackf723 Nothing seems to have changed there.
@tspivey @jackf723 I’ll need to tune it more after work, but I did track down and fix a real bug in this hotfix. At speech rates above 70, diphthong glides were losing their sweep entirely.
Root cause: diphthongs are generated as a sequence of micro-frames that interpolate formant frequencies from onset to offset. Each micro-frame has a minimum duration (two full pitch periods) so the cascade resonators can properly settle. When speech rate increases, the overall diphthong duration compresses. That caused the micro-frame count to drop from 4–5 down to just 3. With the onset hold curve packing those three points toward the start, the result was a single large formant jump instead of a smooth transition. The glide effectively collapsed into the onset vowel and sounded flat.
The fix was to raise the minimum micro-frame count from 3 to 5. That guarantees enough interior waypoints to preserve a smooth sweep regardless of speech rate.
I also retuned the MOUTH onset vowel. It was sitting too close to LOT in formant space, so “found” was drifting toward “fond.” I increased the separation, so “sound” and “found” should now have more perceptible rounding in the glide compared to beta 3.
One remaining issue is amplitude. It can still pop out of the word center more than other segments — subtle in “thousand,” but obvious mid-utterance in “sound” or “found.” There’s likely still a gain transition to smooth out.
Other platform releases will include this fix as well soon.
@Tamasg @tspivey Even now it's a pretty substancial improvements. The split is still present but not nearly as defined. A bit more of that mentioned tuning and it should be great.
@Tamasg This is starting to sound really good, might throw this on my apple devices and try to get used to it. Also the Polish support is starting to sound better, but there are many phonemes that are wrong. Things like the R should be rolled, which I can hear the synth can very much do because it sounds great in Spanish. Also, the letter Y. It should nearly always be pronounced with an "ih" sound like in the word whip, where right now it sounds more like an oo, you can test this with a word like "my." Also, ę is literarly just pronounced as n where the sounds are definitely different but I have no idea how to explain that in words lol. I might try to contribute fixes if when I have more time to figure out the phoneme editor. But you are doing some really amazing work here.
@pitermach oh thank you. Your words already said a million things there that I can tune, so huge huge thanks. My biggest Polish contributor (who I've ignored, but won't for the next beta) has been @spacepup - he's tuned a lot of it and helped get it in better shape. The Y sound is something he has in his new pack actually fixed, and the rolling of the R thing, I can for sure work on as you said. So yeah, expect Polish pack updates now, especially as I've gotten feedback from two solid native speakers. I'm always afraid to tune these packs on my own because I'm only Hungarian, and so I can tune my own language well enough, but I leave Spanish and Portuguese up to the people who actually know them. Lol like @clv1 - that's why I rarely touch them. But yeah, again, huge huge thanks for this.
@Tamasg I can now clearly understand the Benjamin voice at 100% speech rate with formant sharpness at 80%, so it has improved a lot. Why does rate boost not show up in the settings like with other synthesizers?