Greetings.
I tried training the BiCodec on more than 1K hours of multilingual data on the default 32khz config. after 10K-20K steps, it results in an intelligible output but it never improves in terms of acoustic quality or speaker similarity beyond that (I have waited for two days and 72K steps).
here's the log:
20250801_102602.log
here's a sample at step 72,000:
rec_0.wav , ground_truth.wav
I appreciate your input.
Thanks!