Skip to content

Commit bec245d

Browse files
authored
Full squashed update of PC-NSF-HiFiGAN version (#16)
* Squashed commits from dev branch * Edit finetune_ckpt_path to align old config * Edit README
1 parent 842d997 commit bec245d

File tree

13 files changed

+343
-440
lines changed

13 files changed

+343
-440
lines changed

.gitignore

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -157,4 +157,8 @@ cython_debug/
157157
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
158158
# and can be added to the global gitignore or merged into this file. For a more nuclear
159159
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
160-
#.idea/
160+
.idea/
161+
162+
/data/
163+
/experiments/
164+
/pretrained/

README.md

Lines changed: 36 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ data_input_path: [] # 这个是你的 wav 的输入目录
2020

2121
data_out_path: [] # 这个是你的 npz 的输出目录, 预处理之后的格式是 npz
2222

23-
val_num: 1 # 这个是你要的 val 数量
23+
val_num: 10 # 这个是你要的 val 数量
2424
```
2525
2626
例子
@@ -30,20 +30,11 @@ data_input_path: ['wav/in1','wav/in2'] # 这个是你的wav的输入目录
3030
data_out_path: ['wav/out1','wav/out2'] # 这个是你的npz的输出目录
3131
val_num: 5 # 这个是你要的 val 数量,预处理的时候会自动抽取文件
3232
# 两个列表里面的路径是一一对应的所以说数量要一样
33-
# 然后预处理的时候会扫描全部的 .wav 文件,包括子文件夹的
33+
# 然后预处理的时候会扫描全部的 .wav 和 .flac 文件,包括子文件夹的
3434
# 正常情况下只有这三个要改
3535
```
36-
# 离线数据增强
37-
将预处理脚本替换为[process_aug.py](process_aug.py) 并增添配置项
38-
```yaml
39-
key_aug: false # 表示训练时不进行增强
40-
aug_min: 0.9 # 最小变调倍数
41-
aug_max: 1.4 # 最大变调倍数
42-
aug_num: 1 # 数据增强倍数
43-
```
44-
即可,注意数据增强可能会损伤音质!
4536
# 在线数据增强(推荐)
46-
增加配置项,注意使用在线数据增强请使用[process.py](process.py) 脚本,否则会造成离线增强与在线增强叠加
37+
增加配置项
4738
```yaml
4839
key_aug: true # 表示在训练时进行增强
4940
key_aug_prob: 0.5 # 增强概率
@@ -82,13 +73,29 @@ python [export_ckpt.py](export_ckpt.py) --ckpt_path ckpt路径 --save_path 导
8273
```yaml
8374
data_input_path: [] # 这个列表 是你原始wav文件的路径
8475
data_out_path: [] # 此列表 预处理输出的npz文件的路径
85-
val_num: 1 # 这个是在验证的时候 抽取的音频文件数量
76+
val_num: 10 # 这个是在验证的时候 抽取的音频文件数量
8677
```
8778
然后执行预处理
8879
```sh
8980
python process.py --config (your config path) --num_cpu (Number of cpu threads used in preprocessing) --strx (1 for a forced absolute path 0 for a relative path)
9081
```
9182
## 训练
83+
根据自己的显卡修改配置项
84+
(默认开启mini_nsf和pc_aug,特殊需要请自行关闭并修改配置文件,此处不作推荐)
85+
86+
以下是24G显卡推荐配置(默认设定无需修改)
87+
```yaml
88+
crop_mel_frames: 48
89+
batch_size: 10
90+
pc_aug_rate: 0.5
91+
```
92+
以下是16G显卡推荐配置(需手动编辑或添加配置)
93+
```yaml
94+
crop_mel_frames: 32
95+
batch_size: 10
96+
pc_aug_rate: 0.4
97+
```
98+
训练命令
9299
```sh
93100
python train.py --config (your config path) --exp_name (your ckpt name) --work_dir Working catalogue (optional)
94101
```
@@ -140,3 +147,19 @@ python export_ckpt.py --ckpt_path (your ckpt path) --save_path (output ckpt pat
140147
[univnet.yaml](configs/univnet.yaml) 训练原版univnet
141148

142149
[lvc_base_ddspgan.yaml](configs/lvc_base_ddspgan.yaml) 训练使用lvc滤波器的 ddsp模型
150+
151+
# 特别声明
152+
153+
我们遗憾地公示一份经核实的《不友好行为备案清单》(下附)。该名单记录了长期对开发团队实施破坏性行径的个人/实体。
154+
我们郑重声明:
155+
156+
1. 强烈建议所有使用者在下载和使用此声码器前阅读本备案清单
157+
2. 当前未对名单主体施加任何技术或法律层面的使用限制,因为声码器仍基于 CC BY-NC-SA 4.0 许可
158+
3. 若持续发生恶意行为,保留进一步施加限制的权利
159+
160+
## 不友好行为备案清单
161+
162+
| 名称 | 标识 | 原因 |
163+
|:----------------:|:-----------------------------------------------------------------------|:---------------------------------------------------------------|
164+
| 旋转_turning_point | QQ:2673587414;<br/>Bilibili UID:285801087;<br/>Discord 用户名:colstone233 | 长期对开发者进行敌对和人身攻击,反复传播关于 DiffSinger 和开发团队的虚假信息,干扰声码器及其他社区项目的开发进程 |
165+

README_en.md

Lines changed: 38 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -13,11 +13,11 @@ python process.py --config (your config path) --num_cpu (Number of cpu threads u
1313
The following configuration items are what you need to change during preprocessing
1414
```yaml
1515

16-
data_input_path: [] the path for your data
16+
data_input_path: [] # the path for your data
1717

18-
data_out_path: [] the path for the preprocessed output
18+
data_out_path: [] # the path for the preprocessed output
1919

20-
val_num: 1 the number of validation audio
20+
val_num: 10 # the number of validation audio
2121
```
2222
An example
2323
```yaml
@@ -30,12 +30,27 @@ val_num: 5 # This is the number of valves you want.
3030

3131
# (The paths in the two lists are one-to-one, so the number should be the same.)
3232

33-
# (Then, the preprocessor scans all .wav files, including subfolders.)
33+
# (Then, the preprocessor scans all .wav and .flac files, including subfolders.)
3434

3535
# (Normally, there are only these three to change.)
3636
```
3737

3838
## Training
39+
Adjust config according to your GPU memory
40+
(mini_nsf and pc_aug is enabled by default)
41+
42+
For 24GB memory (default)
43+
```yaml
44+
crop_mel_frames: 48
45+
batch_size: 10
46+
pc_aug_rate: 0.5
47+
```
48+
For 16GB memory (need manual editing)
49+
```yaml
50+
crop_mel_frames: 32
51+
batch_size: 10
52+
pc_aug_rate: 0.4
53+
```
3954
Run the following training script
4055
```sh
4156
python train.py --config (your config path) --exp_name (your ckpt name) --work_dir (working directory, optional)
@@ -54,18 +69,8 @@ if you finish training you can use this script to export the diffsinger vocoder
5469
python export_ckpt.py --ckpt_path (your ckpt path) --save_path (output ckpt path) --work_dir (working directory, optional)
5570
```
5671

57-
# Offline data augmentation
58-
Replace the preprocessing script with [process_aug.py](process_aug.py) and add configuration entries
59-
```yaml
60-
key_aug: false (Do not augment during training)
61-
aug_min: 0.9 (Minimum f0 adjustment multiplier)
62-
aug_max: 1.4 (Maximum f0 adjustment multiplier)
63-
aug_num: 1 (Data augmentation multiplier)
64-
```
65-
That's it. Note that data augmentation may damage the sound quality!
66-
6772
# Online data augmentation (recommend)
68-
Note that to use the online data augmentation, use the [process.py](process.py) script, otherwise offline and online augmentation will be superimposed
73+
add config
6974
```yaml
7075
key_aug: true (Do augment during training)
7176
key_aug_prob: 0.5 (Data augmentation probability)
@@ -110,4 +115,21 @@ Almost 2k steps is enough for fine-tuning of small dataset.
110115
111116
[univnet.yaml](configs%2Funivnet.yaml) Training original univnet
112117
113-
[lvc_base_ddspgan.yaml](configs%2Flvc_base_ddspgan.yaml) Training ddsp model with lvc filters
118+
[lvc_base_ddspgan.yaml](configs%2Flvc_base_ddspgan.yaml) Training ddsp model with lvc filters
119+
120+
# Special Statements
121+
122+
We regret to publish a verified Registry of Hostile Conduct (shown as below). This registry documents individuals/entities who have engaged in long-term destructive activities against the development team.
123+
124+
We solemnly declare:
125+
126+
1. Strongly recommend all users review this registry before downloading and using this vocoder
127+
2. No technical or legal restrictions are currently imposed on listed parties, as the vocoder is
128+
still licensed under CC BY-NC-SA 4.0
129+
3. Reserve the right to apply further restrictions in case of persistent malicious acts
130+
131+
## Registry of Hostile Conduct
132+
133+
| Name | Identifiers | Reason |
134+
|:----------------:|:-------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
135+
| 旋转_turning_point | QQ: 2673587414;<br/>Bilibili UID: 285801087;<br/>Discord username: colstone233 | Engaging in long-term hostile and personal attacks against developers, repeatedly spreading false information about DiffSinger and the development team, and interfering with the development process of the vocoder and other projects in the community |

configs/base_hifi.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ f0_min: 65
1111
f0_max: 1100
1212

1313
pc_aug: false # pc-nsf training method
14-
pc_aug_prob: 0.5
14+
pc_aug_rate: 0.5
1515
pc_aug_key: 5
1616

1717
aug_min: 0.9

configs/ft_hifigan.yaml

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -4,22 +4,22 @@ base_config:
44

55
data_input_path: []
66
data_out_path: []
7-
val_num: 1
7+
val_num: 10
88

99
pe: 'parselmouth' # 'parselmouth' or 'harvest'
1010
f0_min: 65
1111
f0_max: 1100
1212

13-
pc_aug: false # pc-nsf training method
14-
pc_aug_prob: 0.5
15-
pc_aug_key: 5
16-
1713
aug_min: 0.9
1814
aug_max: 1.4
1915
aug_num: 1
2016
key_aug: false
2117
key_aug_prob: 0.5
2218

19+
pc_aug: true # pc-nsf training method
20+
pc_aug_rate: 0.4
21+
pc_aug_key: 12
22+
2323
use_stftloss: false
2424
loss_fft_sizes: [2048, 2048, 4096, 1024, 512, 256, 128,1024, 2048, 512]
2525
loss_hop_sizes: [512, 240, 480, 100, 50, 25, 12,120, 240, 50]
@@ -66,7 +66,8 @@ crop_mel_frames: 32
6666

6767
#model_cls: training.nsf_HiFigan_task.nsf_HiFigan
6868
model_args:
69-
mini_nsf: false
69+
mini_nsf: true
70+
noise_sigma: 0.0
7071
upsample_rates: [ 8, 8, 2, 2, 2 ]
7172
upsample_kernel_sizes: [ 16,16, 4, 4, 4 ]
7273
upsample_initial_channel: 512
@@ -113,7 +114,7 @@ sampler_frame_count_grid: 6
113114
ds_workers: 4
114115
dataloader_prefetch_factor: 2
115116

116-
batch_size: 6
117+
batch_size: 10
117118

118119

119120

@@ -147,7 +148,7 @@ seed: 114514
147148
###########
148149

149150
finetune_enabled: true
150-
finetune_ckpt_path: nsf_hifigan_44.1k_hop512_128bin_2024.02.ckpt
151+
finetune_ckpt_path: pc_nsf_hifigan_44.1k_hop512_128bin_2025.02.ckpt
151152
finetune_ignored_params: []
152153
finetune_strict_shapes: true
153154

export_ckpt.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
from utils import get_latest_checkpoint_path
88
from utils.config_utils import read_full_config, print_config
99

10+
1011
@click.command(help='')
1112
@click.option('--exp_name', required=False, metavar='EXP', help='Name of the experiment')
1213
@click.option('--ckpt_path', required=False, metavar='FILE', help='Path to the checkpoint file')
@@ -32,6 +33,7 @@ def export(exp_name, ckpt_path, save_path, work_dir):
3233
if 'generator.' in i:
3334
# print(i)
3435
ckpt[i.replace('generator.', '')] = temp_dict[i]
36+
pathlib.Path(save_path).parent.mkdir(parents=True, exist_ok=True)
3537
torch.save({'generator': ckpt}, save_path)
3638
print("Export checkpoint file successfully: ", save_path)
3739

@@ -53,9 +55,12 @@ def export(exp_name, ckpt_path, save_path, work_dir):
5355
new_config['pc_aug'] = config['pc_aug']
5456
if 'mini_nsf' not in new_config.keys():
5557
new_config['mini_nsf'] = False
58+
if 'noise_sigma' not in new_config.keys():
59+
new_config['noise_sigma'] = 0.0
5660

5761
json_file.write(json.dumps(new_config, indent=1))
5862
print("Export configuration file successfully: ", new_config_file)
5963

64+
6065
if __name__ == '__main__':
6166
export()

models/HiFivae/models.py

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -338,12 +338,10 @@ def forward(self, x):
338338
for l in self.convs:
339339
x = l(x)
340340
x = F.leaky_relu(x, LRELU_SLOPE, inplace=True)
341-
x = torch.nan_to_num(x)
342341

343342
fmap.append(x)
344343

345344
x = self.conv_post(x)
346-
x = torch.nan_to_num(x)
347345
fmap.append(x)
348346
x = torch.flatten(x, 1, -1)
349347

@@ -394,11 +392,9 @@ def forward(self, x):
394392
for l in self.convs:
395393
x = l(x)
396394
x = F.leaky_relu(x, LRELU_SLOPE, inplace=True)
397-
x = torch.nan_to_num(x)
398395
fmap.append(x)
399396

400397
x = self.conv_post(x)
401-
x = torch.nan_to_num(x)
402398
fmap.append(x)
403399
x = torch.flatten(x, 1, -1)
404400

models/nsf_HiFigan/models.py

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -202,7 +202,8 @@ def __init__(self, h):
202202
self.num_kernels = len(h.resblock_kernel_sizes)
203203
self.num_upsamples = len(h.upsample_rates)
204204
self.mini_nsf = h.mini_nsf
205-
205+
self.noise_sigma = h.noise_sigma
206+
206207
if h.mini_nsf:
207208
self.source_sr = h.sampling_rate / int(np.prod(h.upsample_rates[2: ]))
208209
self.upp = int(np.prod(h.upsample_rates[: 2]))
@@ -260,6 +261,8 @@ def forward(self, x, f0):
260261
else:
261262
har_source = self.m_source(f0, self.upp).transpose(1, 2)
262263
x = self.conv_pre(x)
264+
if self.noise_sigma is not None and self.noise_sigma > 0:
265+
x += self.noise_sigma * torch.randn_like(x)
263266
for i in range(self.num_upsamples):
264267
x = F.leaky_relu(x, LRELU_SLOPE)
265268
x = self.ups[i](x)
@@ -354,12 +357,10 @@ def forward(self, x):
354357
for l in self.convs:
355358
x = l(x)
356359
x = F.leaky_relu(x, LRELU_SLOPE, inplace=True)
357-
x = torch.nan_to_num(x)
358360

359361
fmap.append(x)
360362

361363
x = self.conv_post(x)
362-
x = torch.nan_to_num(x)
363364
fmap.append(x)
364365
x = torch.flatten(x, 1, -1)
365366

@@ -412,11 +413,9 @@ def forward(self, x):
412413
for l in self.convs:
413414
x = l(x)
414415
x = F.leaky_relu(x, LRELU_SLOPE, inplace=True)
415-
x = torch.nan_to_num(x)
416416
fmap.append(x)
417417

418418
x = self.conv_post(x)
419-
x = torch.nan_to_num(x)
420419
fmap.append(x)
421420
x = torch.flatten(x, 1, -1)
422421

0 commit comments

Comments
 (0)