|
| 1 | +<div align="center"> |
| 2 | + |
| 3 | +<h1>Retrieval-based-Voice-Conversion-WebUI</h1> |
| 4 | +An easy-to-use Voice Conversion framework based on VITS.<br><br> |
| 5 | + |
| 6 | +[](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI) |
| 8 | + |
| 9 | +<img src="https://counter.seku.su/cmoe?name=rvc&theme=r34" /><br> |
| 10 | + |
| 11 | +[](https://colab.research.google.com/github/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/blob/main/Retrieval_based_Voice_Conversion_WebUI.ipynb) |
| 12 | +[](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/blob/main/LICENSE) |
| 13 | +[](https://huggingface.co/lj1995/VoiceConversionWebUI/tree/main/) |
| 14 | + |
| 15 | +[](https://discord.gg/HcsmBBGyVk) |
| 16 | + |
| 17 | +[**Changelog**](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/blob/main/docs/Changelog_EN.md) | [**FAQ (Frequently Asked Questions)**](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/wiki/FAQ-(Frequently-Asked-Questions)) |
| 18 | + |
| 19 | +[**English**](../en/README.en.md) | [**中文简体**](../../README.md) | [**日本語**](../jp/README.ja.md) | [**한국어**](../kr/README.ko.md) ([**韓國語**](../kr/README.ko.han.md)) | [**Français**](../fr/README.fr.md) | [**Türkçe**](../tr/README.tr.md) | [**Português**](../pt/README.pt.md) |
| 20 | + |
| 21 | +</div> |
| 22 | + |
| 23 | +> Check out our [Demo Video](https://www.bilibili.com/video/BV1pm4y1z7Gm/) here! |
| 24 | +
|
| 25 | +<table> |
| 26 | + <tr> |
| 27 | + <td align="center">Training and inference Webui</td> |
| 28 | + <td align="center">Real-time voice changing GUI</td> |
| 29 | + </tr> |
| 30 | + <tr> |
| 31 | + <td align="center"><img src="https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/assets/129054828/092e5c12-0d49-4168-a590-0b0ef6a4f630"></td> |
| 32 | + <td align="center"><img src="https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/assets/129054828/730b4114-8805-44a1-ab1a-04668f3c30a6"></td> |
| 33 | + </tr> |
| 34 | + <tr> |
| 35 | + <td align="center">go-web.bat</td> |
| 36 | + <td align="center">go-realtime-gui.bat</td> |
| 37 | + </tr> |
| 38 | + <tr> |
| 39 | + <td align="center">You can freely choose the action you want to perform.</td> |
| 40 | + <td align="center">We have achieved an end-to-end latency of 170ms. With the use of ASIO input and output devices, we have managed to achieve an end-to-end latency of 90ms, but it is highly dependent on hardware driver support.</td> |
| 41 | + </tr> |
| 42 | +</table> |
| 43 | + |
| 44 | +> The dataset for the pre-training model uses nearly 50 hours of high quality audio from the VCTK open source dataset. |
| 45 | +
|
| 46 | +> High quality licensed song datasets will be added to the training-set often for your use, without having to worry about copyright infringement. |
| 47 | +
|
| 48 | +> Please look forward to the pretrained base model of RVCv3, which has larger parameters, more training data, better results, unchanged inference speed, and requires less training data for training. |
| 49 | +
|
| 50 | +## Features: |
| 51 | ++ Reduce tone leakage by replacing the source feature to training-set feature using top1 retrieval; |
| 52 | ++ Easy + fast training, even on poor graphics cards; |
| 53 | ++ Training with a small amounts of data (>=10min low noise speech recommended); |
| 54 | ++ Model fusion to change timbres (using ckpt processing tab->ckpt merge); |
| 55 | ++ Easy-to-use WebUI; |
| 56 | ++ UVR5 model to quickly separate vocals and instruments; |
| 57 | ++ High-pitch Voice Extraction Algorithm [InterSpeech2023-RMVPE](#Credits) to prevent a muted sound problem. Provides the best results (significantly) and is faster with lower resource consumption than Crepe_full; |
| 58 | ++ AMD/Intel graphics cards acceleration supported; |
| 59 | ++ Intel ARC graphics cards acceleration with IPEX supported. |
| 60 | + |
| 61 | +## Preparing the environment |
| 62 | +The following commands need to be executed with Python 3.8 or higher. |
| 63 | + |
| 64 | +(Windows/Linux) |
| 65 | +First install the main dependencies through pip: |
| 66 | +```bash |
| 67 | +# Install PyTorch-related core dependencies, skip if installed |
| 68 | +# Reference: https://pytorch.org/get-started/locally/ |
| 69 | +pip install torch torchvision torchaudio |
| 70 | + |
| 71 | +#For Windows + Nvidia Ampere Architecture(RTX30xx), you need to specify the cuda version corresponding to pytorch according to the experience of https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/issues/21 |
| 72 | +#pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117 |
| 73 | + |
| 74 | +#For Linux + AMD Cards, you need to use the following pytorch versions: |
| 75 | +#pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2 |
| 76 | +``` |
| 77 | + |
| 78 | +Then can use poetry to install the other dependencies: |
| 79 | +```bash |
| 80 | +# Install the Poetry dependency management tool, skip if installed |
| 81 | +# Reference: https://python-poetry.org/docs/#installation |
| 82 | +curl -sSL https://install.python-poetry.org | python3 - |
| 83 | + |
| 84 | +# Install the project dependencies |
| 85 | +poetry install |
| 86 | +``` |
| 87 | + |
| 88 | +You can also use pip to install them: |
| 89 | +```bash |
| 90 | + |
| 91 | +for Nvidia graphics cards |
| 92 | + pip install -r requirements.txt |
| 93 | + |
| 94 | +for AMD/Intel graphics cards on Windows (DirectML): |
| 95 | + pip install -r requirements-dml.txt |
| 96 | + |
| 97 | +for Intel ARC graphics cards on Linux / WSL using Python 3.10: |
| 98 | + pip install -r requirements-ipex.txt |
| 99 | + |
| 100 | +for AMD graphics cards on Linux (ROCm): |
| 101 | + pip install -r requirements-amd.txt |
| 102 | +``` |
| 103 | + |
| 104 | +------ |
| 105 | +Mac users can install dependencies via `run.sh`: |
| 106 | +```bash |
| 107 | +sh ./run.sh |
| 108 | +``` |
| 109 | + |
| 110 | +## Preparation of other Pre-models |
| 111 | +RVC requires other pre-models to infer and train. |
| 112 | + |
| 113 | +```bash |
| 114 | +#Download all needed models from https://huggingface.co/lj1995/VoiceConversionWebUI/tree/main/ |
| 115 | +python tools/download_models.py |
| 116 | +``` |
| 117 | + |
| 118 | +Or just download them by yourself from our [Huggingface space](https://huggingface.co/lj1995/VoiceConversionWebUI/tree/main/). |
| 119 | + |
| 120 | +Here's a list of Pre-models and other files that RVC needs: |
| 121 | +```bash |
| 122 | +./assets/hubert/hubert_base.pt |
| 123 | + |
| 124 | +./assets/pretrained |
| 125 | + |
| 126 | +./assets/uvr5_weights |
| 127 | + |
| 128 | +Additional downloads are required if you want to test the v2 version of the model. |
| 129 | + |
| 130 | +./assets/pretrained_v2 |
| 131 | + |
| 132 | +If you want to test the v2 version model (the v2 version model has changed the input from the 256 dimensional feature of 9-layer Hubert+final_proj to the 768 dimensional feature of 12-layer Hubert, and has added 3 period discriminators), you will need to download additional features |
| 133 | + |
| 134 | +./assets/pretrained_v2 |
| 135 | + |
| 136 | +If you want to use the latest SOTA RMVPE vocal pitch extraction algorithm, you need to download the RMVPE weights and place them in the RVC root directory |
| 137 | + |
| 138 | +https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/rmvpe.pt |
| 139 | + |
| 140 | + For AMD/Intel graphics cards users you need download: |
| 141 | + |
| 142 | + https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/rmvpe.onnx |
| 143 | + |
| 144 | +``` |
| 145 | +
|
| 146 | +### 2. Install FFmpeg |
| 147 | +If you have FFmpeg and FFprobe installed on your computer, you can skip this step. |
| 148 | +
|
| 149 | +#### For Ubuntu/Debian users |
| 150 | +```bash |
| 151 | +sudo apt install ffmpeg |
| 152 | +``` |
| 153 | +#### For MacOS users |
| 154 | +```bash |
| 155 | +brew install ffmpeg |
| 156 | +``` |
| 157 | +#### For Windows users |
| 158 | +Download these files and place them in the root folder: |
| 159 | +- [ffmpeg.exe](https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/ffmpeg.exe) |
| 160 | +
|
| 161 | +- [ffprobe.exe](https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/ffprobe.exe) |
| 162 | +
|
| 163 | +## ROCm Support for AMD graphic cards (Linux only) |
| 164 | +To use ROCm on Linux install all required drivers as described [here](https://rocm.docs.amd.com/en/latest/deploy/linux/os-native/install.html). |
| 165 | +
|
| 166 | +On Arch use pacman to install the driver: |
| 167 | +```` |
| 168 | +pacman -S rocm-hip-sdk rocm-opencl-sdk |
| 169 | +```` |
| 170 | +
|
| 171 | +You might also need to set these environment variables (e.g. on a RX6700XT): |
| 172 | +```` |
| 173 | +export ROCM_PATH=/opt/rocm #Set ROCM Executables Path |
| 174 | +export HSA_OVERRIDE_GFX_VERSION=10.3.0 #Spoof GPU Model for ROCM |
| 175 | +```` |
| 176 | +
|
| 177 | +And overwrite PyTorch with its ROCM version after installing dependencies. |
| 178 | +```` |
| 179 | +pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2 |
| 180 | +```` |
| 181 | +
|
| 182 | +Make sure your user is part of the `render` and `video` group: |
| 183 | +```` |
| 184 | +sudo usermod -aG render $USERNAME |
| 185 | +sudo usermod -aG video $USERNAME |
| 186 | +```` |
| 187 | +
|
| 188 | +## Get started |
| 189 | +### start up directly |
| 190 | +Use the following command to start WebUI: |
| 191 | +```bash |
| 192 | +python infer-web.py |
| 193 | +``` |
| 194 | +### Use the integration package |
| 195 | +Download and extract file `RVC-beta.7z`, then follow the steps below according to your system: |
| 196 | +#### For Windows users |
| 197 | +Double click `go-web.bat` |
| 198 | +#### For MacOS users |
| 199 | +```bash |
| 200 | +sh ./run.sh |
| 201 | +``` |
| 202 | +### For Intel IPEX users (Linux Only) |
| 203 | +```bash |
| 204 | +source /opt/intel/oneapi/setvars.sh |
| 205 | +``` |
| 206 | +## Credits |
| 207 | ++ [ContentVec](https://github.com/auspicious3000/contentvec/) |
| 208 | ++ [VITS](https://github.com/jaywalnut310/vits) |
| 209 | ++ [HIFIGAN](https://github.com/jik876/hifi-gan) |
| 210 | ++ [Gradio](https://github.com/gradio-app/gradio) |
| 211 | ++ [FFmpeg](https://github.com/FFmpeg/FFmpeg) |
| 212 | ++ [Ultimate Vocal Remover](https://github.com/Anjok07/ultimatevocalremovergui) |
| 213 | ++ [audio-slicer](https://github.com/openvpi/audio-slicer) |
| 214 | ++ [Vocal pitch extraction:RMVPE](https://github.com/Dream-High/RMVPE) |
| 215 | + + The pretrained model is trained and tested by [yxlllc](https://github.com/yxlllc/RMVPE) and [RVC-Boss](https://github.com/RVC-Boss). |
| 216 | + |
| 217 | +## Thanks to all contributors for their efforts |
| 218 | +<a href="https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/graphs/contributors" target="_blank"> |
| 219 | + <img src="https://contrib.rocks/image?repo=RVC-Project/Retrieval-based-Voice-Conversion-WebUI" /> |
| 220 | +</a> |
| 221 | +
|
0 commit comments