Mastodawn

I gave Pocket TTS a shot. It's a local AI voice that claims to run well on the CPU, so it's faster to generate output and doesn't require a GPU. The main problem with it is the same as with so many voices these days, AI or otherwise: punctuation. Periods and commas sound the same, and question marks aren't always noticeable. Parenthetical phrases sound odd, and the double dash isn't handled at all. I love seeing this field continue to move forward, but I have yet to meet an AI voice I like.

Show thread

Alex Hall Jan 16

In case you want to play with Pocket TTS on your own: https://github.com/kyutai-labs/pocket-tts?tab=readme-ov-file

GitHub - kyutai-labs/pocket-tts: A TTS that fits in your CPU (and pocket)

A TTS that fits in your CPU (and pocket). Contribute to kyutai-labs/pocket-tts development by creating an account on GitHub.

GitHub

Show thread

Kaliah Jan 16

@x0 @alexhall I think it's a really cool field, I just don't think it's advanced far enough to be viable for extensive uses like for screen readers IMO.

Show thread

Luis Carlos Jan 16

@x0 @alexhall So could this work with screen readers? Maybe another provider for Sonata when the developer finishes or concludes their TTS lab project

Show thread

Alex Hall Jan 16

@luiscarlosgonzalez @x0 I doubt it, given how responsive screen reader speech needs to be, but you never know.

Show thread

Mckensie parker Jan 18

@alexhall What is this pocket TTS, is this for NVDA or something?

Show thread

Alex Hall Jan 18

@mckensie Not for NVDA, no. It's one of the increasing number of AI-based speech synthesizers. With some Python coding, you hadnd it your text and you get back a .wav file. The goal is natural-sounding speech, but all local and not using a company's servers.

Show thread

Cleverson Jan 18

@TalkingDroid @alexhall It's shockingly unbelievable how noone has enough brain to think of such a very important detail when making a synth.

Show thread

TalkingDroid Jan 19

@clv1 @alexhall I know, but how many synths do you know where you can really hear the question mark for example? Eloquence is definitely the best at that.

Show thread

Alex Hall Jan 19

@TalkingDroid @clv1 ESpeak is my daily driver, and it also handles punctuation quite well. It seems like the more "natural" a voice is, the worse it is with punctuation.

Show thread

Martin Jan 18

@alexhall Ooo, it's actually quite snappy to produce the audio though. Neat!