TGSpeechBox v3.0-beta3 is out! 19 bug fixes, 8 new features, 5 language pack improvements.
The big ones: stop consonants now use research-based burst spectral templates from Stevens & Blumstein — alveolar, velar, and labial stops each get their own shape, so /d/ vs /g/ and /t/ vs /k/ are clearly distinct. Stop clusters in words like "locked" and "kept" now properly unrelease the first stop, the way natural speech does.
MOUTH diphthong onset was only 30 Hz from schwa — "outside" sounded like "ertside." Fixed with Hillenbrand GenAm data. Per-diphthong duration scaling replaces the old global knob, so PRICE gets the time it needs without bloating GOAT. Diphthong rate compensation keeps bare "I" and "Y" from losing identity at high speech rates.
New Fujisaki clause-type overrides let language pack authors tune question/exclamation intonation in YAML. Spanish gets proper Castilian vs Latin American approximant splits. Australian English recovers its hand-tuned vowels.
Clause-final sonorants no longer clip. Cascade resonator pops, gone. Tap timing, fixed three ways.
And yes — we know en-gb PRICE still sounds a bit Stewie Griffin. The glide doesn't curve down the way it should yet. We hear you, it's on the workbench.
https://github.com/tgeczy/TGSpeechBox/releases/download/v-300b3/TGSpeechBox-v300b3.nvda-addon
https://github.com/tgeczy/TGSpeechBox/releases/download/v-300b3/TGSBPhonemeEditor-v300b3.zip
https://github.com/tgeczy/TGSpeechBox/releases/download/v-300b3/TGSpeechSapiSetup-v300b3.exe
https://github.com/tgeczy/TGSpeechBox/releases/download/v-300b3/TGSpeechBox-v300b4.apk
https://github.com/tgeczy/TGSpeechBox/releases/download/v-300b3/tgspeechbox-linux-aarch64-v-300b3.tar.gz
https://github.com/tgeczy/TGSpeechBox/releases/download/v-300b3/tgspeechbox-linux-x86_64-v-300b3.tar.gz
https://testflight.apple.com/join/jvvGY6Fz
@Tamasg Rate 60, pitch mode impulse, US English, read the words found out, or thousand.
@tspivey yeah, that's the glide retuning work. I feel like it's almost there, but a bit too flat. Hmm. But then I can't quite describe it, because in words like about, it's sounding better than, sound. So for sure needs more work. You have a sharp ear as well for noticing, I appreciate that. Ultimately I want it to have that same "aow" glide that Eloquence has for it.
@Tamasg @tspivey This segment:
"work. I feel like it's almost there, "
The I sounds like it's trying to say Uh I with an extremely short occurrence of the uh sound.
@jackf723 @tspivey you are both, absolutely correct. Check it out. new update in both the add-on panel or at https://github.com/tgeczy/TGSpeechBox/releases/download/v-300b3/TGSpeechBox-v300b3.nvda-addon (this is 958 KB, I compressed it with better 7-zip flags), so yeah, try it out. It should be noticeably better at words like "found" and "sound" and not thicken up the speech on the "I" vowel at fast rates, skipping the glide harshly. Y'all made it happen.
@Tamasg @jackf723 Nothing seems to have changed there.
@tspivey @jackf723 I’ll need to tune it more after work, but I did track down and fix a real bug in this hotfix. At speech rates above 70, diphthong glides were losing their sweep entirely.
Root cause: diphthongs are generated as a sequence of micro-frames that interpolate formant frequencies from onset to offset. Each micro-frame has a minimum duration (two full pitch periods) so the cascade resonators can properly settle. When speech rate increases, the overall diphthong duration compresses. That caused the micro-frame count to drop from 4–5 down to just 3. With the onset hold curve packing those three points toward the start, the result was a single large formant jump instead of a smooth transition. The glide effectively collapsed into the onset vowel and sounded flat.
The fix was to raise the minimum micro-frame count from 3 to 5. That guarantees enough interior waypoints to preserve a smooth sweep regardless of speech rate.
I also retuned the MOUTH onset vowel. It was sitting too close to LOT in formant space, so “found” was drifting toward “fond.” I increased the separation, so “sound” and “found” should now have more perceptible rounding in the glide compared to beta 3.
One remaining issue is amplitude. It can still pop out of the word center more than other segments — subtle in “thousand,” but obvious mid-utterance in “sound” or “found.” There’s likely still a gain transition to smooth out.
Other platform releases will include this fix as well soon.
@Tamasg @tspivey Even now it's a pretty substancial improvements. The split is still present but not nearly as defined. A bit more of that mentioned tuning and it should be great.