have been exploring retrieval-based voice conversion today. i would like to train a model on my own voice, as while it is fun to transform things into other voices (though i am more interested in the timbral adjustment rather than just using another voice), this would be a great tool to generate vocal pad backing tracks, harmonies, or even lead vocals when i'm unable to provide them (tts -> tune/time in melodyne -> RVC model of my voice).

@msx Honestly, this is a really good use of an RVC model and why have I not thought of that?

Still, not sure about the requirements of training an RVC model. I know that most AI models right now need pretty beefy PCs. Then again, optimizations has come in for a long while now.

@wishdream i can train a simple model on my 2060 within half an hour, it's very reasonable. i was even able to do it while streaming provided i didn't have anything else in my OBS profile :)
@msx oh neat! I got to try it sometime. I got a 3060ti so it'll probably be a bit more faster.
Also streaming while training a model sounds like it's gonna burn your PC XD
@wishdream amazingly it just crashed obs at first because the training tried to allocate all my vram LOL, even with 6gb vram though i was able to train + encode video with a simple layout :D it's amazing how optimized this stuff is getting
@msx Can you limit the amount of VRAM it uses then? If not, I am surprised that you're able to encode video with it XD I wonder if I could do a stream with training in the background with my layout, gosh it's already VRAM intensive hahaha
@wishdream you can limit the number of epochs it batches and i just knocked that down to one at a time. it uses remarkably little GPU processing power, it's all about the memory
@msx oh then that shouldn't be a problem then. should probably give it a try with that. never really thought of bumping down the number of epochs so that changes things.