I have been testing HiFi cloning. I am getting slight variations in voice depending on input text or it seems to be off on random inputs with same prompt-wav file and same prompt-wav text with cfg value of 3.
What are the options to get a stable cloned voice ? I read about LoRA fine tuning in docs for a speaker but does that mean I will have to train a new model each time I need a new voice or can multiple voices exist simultaneously in the same model ? How does that work?
Is there any other way to get a stable voice output?
I have been testing HiFi cloning. I am getting slight variations in voice depending on input text or it seems to be off on random inputs with same prompt-wav file and same prompt-wav text with cfg value of 3.
What are the options to get a stable cloned voice ? I read about LoRA fine tuning in docs for a speaker but does that mean I will have to train a new model each time I need a new voice or can multiple voices exist simultaneously in the same model ? How does that work?
Is there any other way to get a stable voice output?