have been exploring retrieval-based voice conversion today. i would like to train a model on my own voice, as while it is fun to transform things into other voices (though i am more interested in the timbral adjustment rather than just using another voice), this would be a great tool to generate vocal pad backing tracks, harmonies, or even lead vocals when i'm unable to provide them (tts -> tune/time in melodyne -> RVC model of my voice).
update: flawless victory. input is some saw waves through formant filters. model is just some quick recordings of me reading the harvard sentences lol
this shit owns. very excited to train it on a more expansive set since this one was pretty small. still ends up really expressive when you get some filter envelopes and whatnot on the synths.
@msx what if you'll make a model for google translate so that you can use it in the future if they'll change their voice engine/whatever
@eightone you are a genius
@msx it seemed like a bad idea at first because I forgot that the model can do TTS too....
@eightone this model is actually sound to sound - it maps the model onto the previous sound input - which actually makes for some interesting effects when applying such a very obviously fake TTS as training model
@msx i'm really curious to hear how it will sound. i might make my own model too. but I'm honestly not sure what random text I should put in.
@eightone https://en.wikipedia.org/wiki/Harvard_sentences i used these since it covers most of the phonetic information necessary. it winds up being a small set but gets good results around 30 epochs.
Harvard sentences - Wikipedia