You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* feat(audio): use PyAV instead of ffmpeg
replaced usage of ffmpeg in favor of PyAV (`av`)
* refactor(audio): store all of the audio related functions in the `infer.lib.audio`
refactors previous commit to have singular functions for each task, all located in `infer.lib.audio`
* fix(audio): remove downsample_audio from mdxnet.py
it is no longer needed, since it's imported from infer.lib.audio
* docs: remove every ffmpeg mention in the documentation to avoid confusion
* chore(requirements): remove ffmpeg-python and ffmpy from all requirements
* fix(audio): fix loading for UVR
wrapped gathering of META info from the stream into a function
fixes loading for UVR
* fix(audio): use np.frombuffer() instead of direct conversion of the resampled frames
this fixes traceback on preprocessing
* feat(audio): pre-allocate decoded_audio array in the load_audio function
this should improve performance, even if just a little
* Revert "docs: remove every ffmpeg mention in the documentation to avoid confusion"
This reverts commit 1e05bbc.
* chore(format): run black on dev
* fix(requirements): revert removal of ffmpeg in unitest.yml and Dockerfile
* Revert "fix(requirements): revert removal of ffmpeg in unitest.yml and Dockerfile"
This reverts commit e28a0ee.
* feat(audio): pre-allocate numpy array to store the AudioFrame data in ndarray of dtype float32
* chore(format): run black on dev
* fix(audio): fix the decoded_audio size estimation
in estimated_total_samples we multiply by `sr` instead of `container.streams.audio[0].rate` since we want to estimate size of the OUTPUT file, not the input one. - Added dynamic resizing, in case something goes wrong and the size of decoded_audio is estimated incorrectly
Fixed function `load_audio` when the input audio's samplerate does not match the desired samplerate (`sr`)
* chore(format): run black on dev
* refactor(audio): remove `clean_path()` function as it serves no purpose anymore
* docs: remove everything related to ffmpeg
this includes everything except for formats support specification in the training_tips docs, since it has nothing to do with what ffmpeg does/did but rather what audio formats are supported (all the ones that ffmpeg supports!)
* docs: fix order of the steps in preparation in the READMEs
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
### 3. Download the required files for the rmvpe vocal pitch extraction algorithm
131
+
### 2. Download the required files for the rmvpe vocal pitch extraction algorithm
151
132
152
133
If you want to use the latest RMVPE vocal pitch extraction algorithm, you need to download the pitch extraction model parameters and place them in `assets/rmvpe`.
153
134
@@ -163,7 +144,7 @@ If you want to use the latest RMVPE vocal pitch extraction algorithm, you need t
If you want to run RVC on a Linux system based on AMD's ROCM technology, please first install the required drivers [here](https://rocm.docs.amd.com/en/latest/deploy/linux/os-native/install.html).
It is most likely not a FFmpeg issue, but rather an audio path issue;
3
-
4
-
FFmpeg may encounter an error when reading paths containing special characters like spaces and (), which may cause an FFmpeg error; and when the training set's audio contains Chinese paths, writing it into filelist.txt may cause a utf8 error.<br>
5
-
6
-
## Q2:Cannot find index file after "One-click Training".
1
+
## Q1:Cannot find index file after "One-click Training".
7
2
If it displays "Training is done. The program is closed," then the model has been trained successfully, and the subsequent errors are fake;
8
3
9
4
The lack of an 'added' index file after One-click training may be due to the training set being too large, causing the addition of the index to get stuck; this has been resolved by using batch processing to add the index, which solves the problem of memory overload when adding the index. As a temporary solution, try clicking the "Train Index" button again.<br>
10
5
11
-
## Q3:Cannot find the model in “Inferencing timbre” after training
6
+
## Q2:Cannot find the model in “Inferencing timbre” after training
12
7
Click “Refresh timbre list” and check again; if still not visible, check if there are any errors during training and send screenshots of the console, web UI, and logs/experiment_name/*.log to the developers for further analysis.<br>
13
8
14
-
## Q4:How to share a model/How to use others' models?
9
+
## Q3:How to share a model/How to use others' models?
15
10
The pth files stored in rvc_root/logs/experiment_name are not meant for sharing or inference, but for storing the experiment checkpoits for reproducibility and further training. The model to be shared should be the 60+MB pth file in the weights folder;
16
11
17
12
In the future, weights/exp_name.pth and logs/exp_name/added_xxx.index will be merged into a single weights/exp_name.zip file to eliminate the need for manual index input; so share the zip file, not the pth file, unless you want to continue training on a different machine;
18
13
19
14
Copying/sharing the several hundred MB pth files from the logs folder to the weights folder for forced inference may result in errors such as missing f0, tgt_sr, or other keys. You need to use the ckpt tab at the bottom to manually or automatically (if the information is found in the logs/exp_name), select whether to include pitch infomation and target audio sampling rate options and then extract the smaller model. After extraction, there will be a 60+ MB pth file in the weights folder, and you can refresh the voices to use it.<br>
20
15
21
-
## Q5:Connection Error.
16
+
## Q4:Connection Error.
22
17
You may have closed the console (black command line window).<br>
There is a small chance that there is a problem with the CUDA configuration or the device is not supported; more likely, there is not enough memory (out of memory).<br>
52
47
53
48
For training, reduce the batch size (if reducing to 1 is still not enough, you may need to change the graphics card); for inference, adjust the x_pad, x_query, x_center, and x_max settings in the config.py file as needed. 4G or lower memory cards (e.g. 1060(3G) and various 2G cards) can be abandoned, while 4G memory cards still have a chance.<br>
54
49
55
-
## Q9:How many total_epoch are optimal?
50
+
## Q8:How many total_epoch are optimal?
56
51
If the training dataset's audio quality is poor and the noise floor is high, 20-30 epochs are sufficient. Setting it too high won't improve the audio quality of your low-quality training set.<br>
57
52
58
53
If the training set audio quality is high, the noise floor is low, and there is sufficient duration, you can increase it. 200 is acceptable (since training is fast, and if you're able to prepare a high-quality training set, your GPU likely can handle a longer training duration without issue).<br>
59
54
60
-
## Q10:How much training set duration is needed?
55
+
## Q9:How much training set duration is needed?
61
56
62
57
A dataset of around 10min to 50min is recommended.<br>
63
58
@@ -69,29 +64,29 @@ There are some people who have trained successfully with 1min to 2min data, but
69
64
Data of less than 1min duration has not been successfully attempted so far. This is not recommended.<br>
70
65
71
66
72
-
## Q11:What is the index rate for and how to adjust it?
67
+
## Q10:What is the index rate for and how to adjust it?
73
68
If the tone quality of the pre-trained model and inference source is higher than that of the training set, they can bring up the tone quality of the inference result, but at the cost of a possible tone bias towards the tone of the underlying model/inference source rather than the tone of the training set, which is generally referred to as "tone leakage".<br>
74
69
75
70
The index rate is used to reduce/resolve the timbre leakage problem. If the index rate is set to 1, theoretically there is no timbre leakage from the inference source and the timbre quality is more biased towards the training set. If the training set has a lower sound quality than the inference source, then a higher index rate may reduce the sound quality. Turning it down to 0 does not have the effect of using retrieval blending to protect the training set tones.<br>
76
71
77
72
If the training set has good audio quality and long duration, turn up the total_epoch, when the model itself is less likely to refer to the inferred source and the pretrained underlying model, and there is little "tone leakage", the index_rate is not important and you can even not create/share the index file.<br>
78
73
79
-
## Q12:How to choose the gpu when inferring?
74
+
## Q11:How to choose the gpu when inferring?
80
75
In the config.py file, select the card number after "device cuda:".<br>
81
76
82
77
The mapping between card number and graphics card can be seen in the graphics card information section of the training tab.<br>
83
78
84
-
## Q13:How to use the model saved in the middle of training?
79
+
## Q12:How to use the model saved in the middle of training?
85
80
Save via model extraction at the bottom of the ckpt processing tab.
86
81
87
-
## Q14:File/memory error(when training)?
82
+
## Q13:File/memory error(when training)?
88
83
Too many processes and your memory is not enough. You may fix it by:
89
84
90
85
1、decrease the input in field "Threads of CPU".
91
86
92
87
2、pre-cut trainset to shorter audio files.
93
88
94
-
## Q15: How to continue training using more data
89
+
## Q14: How to continue training using more data
95
90
96
91
step1: put all wav data to path2.
97
92
@@ -101,19 +96,19 @@ step3: copy the latest G and D file of exp_name1 (your previous experiment) into
101
96
102
97
step4: click "train the model", and it will continue training from the beginning of your previous exp model epoch.
103
98
104
-
## Q16: error about llvmlite.dll
99
+
## Q15: error about llvmlite.dll
105
100
106
101
OSError: Could not load shared object file: llvmlite.dll
107
102
108
103
FileNotFoundError: Could not find module lib\site-packages\llvmlite\binding\llvmlite.dll (or one of its dependencies). Try using the full path with constructor syntax.
109
104
110
105
The issue will happen in windows, install https://aka.ms/vs/17/release/vc_redist.x64.exe and it will be fixed.
111
106
112
-
## Q17: RuntimeError: The expanded size of the tensor (17280) must match the existing size (0) at non-singleton dimension 1. Target sizes: [1, 17280]. Tensor sizes: [0]
107
+
## Q16: RuntimeError: The expanded size of the tensor (17280) must match the existing size (0) at non-singleton dimension 1. Target sizes: [1, 17280]. Tensor sizes: [0]
113
108
114
109
Delete the wav files whose size is significantly smaller than others, and that won't happen again. Than click "train the model"and "train the index".
115
110
116
-
## Q18: RuntimeError: The size of tensor a (24) must match the size of tensor b (16) at non-singleton dimension 2
111
+
## Q17: RuntimeError: The size of tensor a (24) must match the size of tensor b (16) at non-singleton dimension 2
117
112
118
113
Do not change the sampling rate and then continue training. If it is necessary to change, the exp name should be changed and the model will be trained from scratch. You can also copy the pitch and features (0/1/2/2b folders) extracted last time to accelerate the training process.
Copy file name to clipboardExpand all lines: docs/fr/README.fr.md
-11Lines changed: 0 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -112,16 +112,6 @@ Voici une liste des modèles et autres fichiers requis par RVC :
112
112
113
113
./assets/pretrained_v2
114
114
115
-
# Si vous utilisez Windows, vous pourriez avoir besoin de ces fichiers pour ffmpeg et ffprobe, sautez cette étape si vous avez déjà installé ffmpeg et ffprobe. Les utilisateurs d'ubuntu/debian peuvent installer ces deux bibliothèques avec apt install ffmpeg. Les utilisateurs de Mac peuvent les installer avec brew install ffmpeg (prérequis : avoir installé brew).
# Si vous souhaitez utiliser le dernier algorithme RMVPE de pitch vocal, téléchargez les paramètres du modèle de pitch et placez-les dans le répertoire racine de RVC.
0 commit comments