-
Couldn't load subscription status.
- Fork 4.6k
Description
I don't know what I'm doing wrong. Any model I train with RVC produces highly metallic/robotic timbre at inference, even on clips that were part of the training set. I'm running RVC on MimicPC using NVIDIA L4/A10.
Example: 21280.wav (note: the target singer and the training singer sound very similar - both are high pitched female voices - so they should be compatible ) This example is with 140 epochs. 90 epochs and 220 epochs sound similar. Including the feature index made no difference either.
I've tried this using two data sets. The first data set is 90mins of lead vocals extracted from songs. This data was manually cleaned (remove reverb, FX, set to mono and rejected any phasey stereo parts). When this didn't produce good results I figured the vocal isolation tool was adding artifacts so I made a new data set. The second data set was 15mins of raw unprocessed vocals from a singer I recorded with no effects. This should be enough to train on but still the result was metallic/robotic. Even when I apply the model to an audio clip that inside the training set it sounds robotic!
There's no sample rate mismatch (I've tried with both 48k and 40k). I'm using v2, as well as pitch tracking. I've tried training from 80-220 epochs with a save frequency of 10. I've tried it with/without retrieval index. I've tried rmvpe, rmvpe_gpu, and crepe. I've tried every combination of protect voiceless consonants, median filtering, envelope scaling, search feature index. The robotic sound is present even without the feature index. The output changes but it's always robotic and not usable despite the target material being a similar voice to the training sets.
But when I try using 3rd party model I found online it works fine! I don't know what I'm doing wrong in training. I've followed all steps exactly. Any help is appreciated, I've been trying for 6 days to get this to work.