Mastodawn

msx Jul 11, 2023

have been exploring retrieval-based voice conversion today. i would like to train a model on my own voice, as while it is fun to transform things into other voices (though i am more interested in the timbral adjustment rather than just using another voice), this would be a great tool to generate vocal pad backing tracks, harmonies, or even lead vocals when i'm unable to provide them (tts -> tune/time in melodyne -> RVC model of my voice).

Show thread

msx Jul 11, 2023

update: flawless victory. input is some saw waves through formant filters. model is just some quick recordings of me reading the harvard sentences lol

Show thread

msx Jul 11, 2023

this shit owns. very excited to train it on a more expansive set since this one was pretty small. still ends up really expressive when you get some filter envelopes and whatnot on the synths.

Show thread

sleepy sunflower

@msx what if you'll make a model for google translate so that you can use it in the future if they'll change their voice engine/whatever

Show thread

msx Jul 11, 2023

@eightone you are a genius

Show thread

sleepy sunflower Jul 11, 2023

@msx it seemed like a bad idea at first because I forgot that the model can do TTS too....

Show thread

msx Jul 11, 2023

@eightone this model is actually sound to sound - it maps the model onto the previous sound input - which actually makes for some interesting effects when applying such a very obviously fake TTS as training model

Show thread

sleepy sunflower Jul 11, 2023

@msx i'm really curious to hear how it will sound. i might make my own model too. but I'm honestly not sure what random text I should put in.

Show thread

msx Jul 11, 2023

@eightone https://en.wikipedia.org/wiki/Harvard_sentences i used these since it covers most of the phonetic information necessary. it winds up being a small set but gets good results around 30 epochs.

Harvard sentences - Wikipedia