Mastodawn

heh TIL https://github.com/MycroftAI/mimic3-voices/tree/master/voices/en_UK/apope_low (a voice model we may be looking at using) is based on @popey 's voice

mimic3-voices/voices/en_UK/apope_low at master · MycroftAI/mimic3-voices

Voice models for Mimic 3 text to speech system. Contribute to MycroftAI/mimic3-voices development by creating an account on GitHub.

GitHub

popey Feb 21, 2023

@theresnotime Indeed it is. It's way better than the previous incarnation. I wrote a blog post about it a while back. https://popey.com/blog/2022/10/blog-to-speech-in-my-voice/

Blog To Speech - In My Voice

Recently my Internet friend Terrence Eden crafted a blog post titled Blog To Speech which you might want to also read. It serves as an inspiration for this post. In short, there’s a trend in blogging (and on some news sites) to add an audio transcription of the page you’re reading, usually at the top of the article. Mostly this is done semi-automatically using a bot to read in an “AI generated” voice such as Amazon Polly.

Alan Pope's blog

@popey that is very cool, though IPA/linguistics still makes me cry

9th level spell slut Feb 21, 2023

@theresnotime only a dabbler, but "hello" has two pronunciations common in English use: that one, and /hɛˈləʊ/ (which is what I assume yours is)

Nikki Feb 21, 2023

@hierarchon @theresnotime that's what I was thinking too. the schwa in the audio file is also a lot longer than would normally be used in an English word

@hierarchon okay but /hɛˈləʊ/ then should, per the Laws Of IPA (/s) should the same in any language right? IPA is language agnostic right..?

9th level spell slut Feb 21, 2023

@theresnotime same-ish, yeah. there's something about phonemes and phones that I don't fully get

vy-let Feb 21, 2023

@hierarchon @theresnotime one thing I frequently trip myself up on is using pronunciation keys that compare the symbols to parts of certain words, but the authors of those keys almost invariably choose examples that are ambiguous between dialects and regions (sigh), so they can seem contradictory or flexible when they’re not meant to be

Owen Blacker Feb 21, 2023

@hierarchon @theresnotime The short version is that we use some symbols nonspecifically when the difference between accents is unimportant.

So /e/ in English might actually be /ɛ/, for example, but English doesn't really differentiate between the two sounds. Similarly we don't bother writing the difference between the /pʰ/ and the /p/ in "pepper", because that difference isn't phonemically important, whereas that matters in, say, Hindi (फ and प, respectively)

Owen Blacker Feb 21, 2023

@hierarchon @theresnotime (I didn't scroll far enough to see the other, much better answers. Sorry about that.)

@theresnotime There's a difference between broad transcription and narrow transcription. Broad transcription is almost phonemic, not phonetic, and tends to use the same symbol for the phonemes in a language, irrespective of (some) allophonic variation across contrasts. As a result, a single symbol can do quite a lot of heavy lifting, as long as it's not crossing phoneme boundaries, and the same phoneme won't sound the same in all contexts.

@theresnotime So the symbol /u/ will sound very different in California English vs. Japanese vs. French, even though in all three languages it's used in broad transcription because it's close enough to the high(ish) back(ish) rounded(ish) vowel in those languages' vowel inventories.

@theresnotime If you really want IPA to give you a pronunciation that will be relatively stable across languages and contexts, then you enter narrow transcription land. For that, you're going to have to make HEAVY use of diacritics, and those are such an almighty pain to type that people tend to use broad transcription unless they know for sure that they're talking to a highly trained phonetic audience that will know what to do with the diacritics.

@ergative I appreciate the explanations — so the concept of taking a broad transcription and say, putting it through a text-to-speech engine (even one well-trained on IPA) is going to give mixed-to-poor results?

@theresnotime Right. If the text-to-speech engine is only trained on IPA, then it will sound terrible. There is SO MUCH more detail in English than can be captured by broad phonetic transcription.

One reason we leave the detail out of broad transcription is because much of it is predictable from context. But if the engine is not trained on the English-specific contextual effects, than no matter how well trained it is on IPA-proper, it's not going to sound like English.

@ergative 🙃 *quiet sobbing*

Thank you for taking the time to explain this, I really appreciate it :)

@theresnotime And even with the narrowest of narrow phonetic transcriptions, the IPA isn't really sensitive enough to capture the nuance of pronunciation variation. It's a discrete system trying to capture a highly gradient phenomenon. So in contexts where the narrowest of narrow phonetic transcriptions is necessary, these days phoneticians will use spectrograms in their written articles, and upload audio recordings in their supplemental materials, rather than provide narrow transcriptions.

@theresnotime tl;dr: there is not 'only one' way of pronouncing an IPA symbol. It's an imperfect system that's useful but limited; and it only ever had a hope of being even imperfectly universal and invariant in the days before better technology made describing phonetics more accurate and detailed than the IPA could ever hope to be.

https://grieve-smith.com/blog/2015/12/levels-of-phonetic-description/

@theresnotime Exactly, there is NOT only one way of pronouncing any given IPA transcription, for multiple reasons.

The simplest reason is that both tongue height and tongue place are continua, and whenever you divide a continuum you get granularity. It's like saying that 5'4" is not just one height.

There are also other dimensions that can be heard, but are not typically transcribed! #linguistics #phonetics

Levels of phonetic description

When I first studied phonetic transcription I learned about broad and narrow transcription, where narrow transcription contains much more detail, like the presence of aspiration on consonants and fine distinctions of tongue height. Of course it makes sense that you wouldn't always want to go into s

Technology and language

@theresnotime The ə symbol is more difficult than others, because it's used in at least 3 conflicting ways.

Some people use it to indicate a universal middle-center tongue position, as in that audio file.

Other people use ə to indicate the specific tongue/lip configuration most commonly used for reduced vowels. But e.g. the /ə/ for English is very different from the /ə/ for French.

Still others use ə for an *underspecified* reduced vowel - it doesn't matter where someone puts their tongue!

@grvsmth thank you, I appreciate the insight! Your blog post notes "the International Phonetic Alphabet was sold as just such a consistent system: one symbol for one sound" — that sure would be nice (especially in relation to the project we've been working on), but as you go on to say, reality seems to "fall short of the ideal consistent representation that was sold to people"

@theresnotime Exactly! The IPA is great for specialists to communicate more detail about speech, more consistently than we can do with writing. It's not your fault that they oversold it!

In general, without knowing anything about this project, as someone who's developed automatic language generation systems, I recommend being careful about what you're using technology for, and who it benefits!

@grvsmth for what it's worth, the project in question is https://w.wiki/6Mds — in summary, a MediaWiki (Wikipedia) extension which allows people to click on the IPA shown on a lot of Wikipedia articles and hear the generated audio...

In a lot of cases, audio recordings of the word represented by the IPA already exist on Wikimedia Commons, so ideally the extension will use that if present and "fall back" to generating audio via the IPA (as a human voice is always going to sound better than generated audio, regardless of the IPA)

Community Wishlist Survey 2022/Generate Audio for IPA - Meta

@theresnotime "However, because only very few people on the planet read this notation, it is practically impossible for folks to discern how to pronounce something purely based on IPA notation."

It's an interesting idea, but it sounds like "Only very few people can read molecular formulas, so we'll create automated 3D renderings of the substance depicted by the formulas"!

In both cases it's an interesting idea, but the systems are not consistent or complete enough to bypass expert judgment!

https://meta.wikimedia.org/wiki/Community_Wishlist_Survey_2022/Reading/IPA_audio_renderer

@theresnotime I agree with the comment by [Modest Genius] from January 31, 2022, and I disagree with [Andy Mabbett]. Sometimes it is better to have no pronunciation than to have every word reinforcing an arbitrary bias!

'Even so, better to have "every word in (say) a mid-Atlantic accent" than no audio pronunciation at all.'

I've seen so many people point even to vague dictionary transcriptions as "the right way to pronounce this." I can imagine what they'd do here!

Community Wishlist Survey 2022/Reading/IPA audio renderer - Meta

@theresnotime I think it's interesting that the page only lists 304 entries using the IPA template. Obviously, it depends on how much IPA is in each entry, but that does not seem like much for a community-based recording project!

@grvsmth oh that 304 is the number of templates which use IPA — from memory, there's 100,000s of instances of IPA in use across the Wikimedia projects :) Wiktionary for example tends to have a few per entry

hello - Wiktionary, the free dictionary

Wiktionary

@theresnotime 😯 ah, okay, yeah, that sounds more like my experience with Wikimedia, so it would be a huge commitment of time and effort to record them all!

@grvsmth though on the note of sourcing audio recordings, https://lingualibre.org/wiki/LinguaLibre:Main_Page is doing amazing work, and where possible this extension will use those human recordings instead of trying to generate anything ✨

Lingua Libre

@theresnotime This is indeed impressive, and it may be useful for a different project I'm thinking of!

Platonides Feb 22, 2023

@theresnotime @grvsmth
For what it's worth, wiktionary in English currently has 544896 *entries* (pages in the main namespace) that use the IPA template. Thus, 544896 words with at least one IPA (often more)