I've been working on detecting avatar mouth poses from raw speech audio, but I don't have any intuition about what each step should produce as intermediary values for debugging. This was unacceptable, so I figured out how to reproduce audio from the compressed mouth shape filter data. Here's what it sounds like when you generate audio by pulsing a filter defined by less than 1 value per millisecond. #soundprocessing #coding
Does it sound good? No. Is there enough information in this compressed form to determine what basic vowel and mouth sounds are being made? Probably! Still a few more steps to go, but I think raw audio to compressed spectral envelope tuned to human speech is the hardest part. Now I just need to update my complex root solver and build a mapping from formants to the mouth poses in my model format.

@Alrecenk

For some reason I love the sound of this. I've listened to it five or six times in the past five minutes. (I was going to say in the last minute, but it was probably longer than that.)

@munroe You would. This is like a primordial version of the speech synthesis style that would have been around in the 80s.