-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
Hi, I tried train my JP multi speaker on WSL2.
But I got following logs and error message.
I'm a beginner so please tell me how to solve it.
==================================================================
......
../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [44,0,0], thread: [82,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [44,0,0], thread: [83,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [44,0,0], thread: [84,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [44,0,0], thread: [85,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [44,0,0], thread: [86,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [44,0,0], thread: [87,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [44,0,0], thread: [88,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [44,0,0], thread: [89,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [44,0,0], thread: [90,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [44,0,0], thread: [91,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [44,0,0], thread: [92,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [44,0,0], thread: [93,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [44,0,0], thread: [94,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [44,0,0], thread: [95,0,0] Assertion srcIndex < srcSelectDimSize failed.
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: device-side assert triggered
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7f9403b6c446 in /home/test/venv/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f9403b166e4 in /home/test/venv/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7f9403f1ba18 in /home/test/venv/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
frame #3: + 0x1021c88 (0x7f93b987fc88 in /home/test/venv/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #4: + 0x102a735 (0x7f93b9888735 in /home/test/venv/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #5: + 0x5faf70 (0x7f940299af70 in /home/test/venv/lib/python3.10/site-packages/torch/lib/libtorch_python.so)
frame #6: + 0x6f69f (0x7f9403b4d69f in /home/test/venv/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #7: c10::TensorImpl::~TensorImpl() + 0x21b (0x7f9403b4637b in /home/test/venv/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #8: c10::TensorImpl::~TensorImpl() + 0x9 (0x7f9403b46529 in /home/test/venv/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #9: + 0x8c1a98 (0x7f9402c61a98 in /home/test/venv/lib/python3.10/site-packages/torch/lib/libtorch_python.so)
frame #10: THPVariable_subclass_dealloc(_object*) + 0x2c6 (0x7f9402c61de6 in /home/test/venv/lib/python3.10/site-packages/torch/lib/libtorch_python.so)
frame #11: /home/test/venv/bin/python() [0x504334]
frame #12: /home/test/venv/bin/python() [0x5102aa]
frame #13: /home/test/venv/bin/python() [0x600b4a]
frame #14: _PyEval_EvalFrameDefault + 0x5dd8 (0x51a858 in /home/test/venv/bin/python)
frame #15: _PyFunction_Vectorcall + 0x75 (0x525775 in /home/test/venv/bin/python)
frame #16: _PyEval_EvalFrameDefault + 0x32d (0x514dad in /home/test/venv/bin/python)
frame #17: _PyFunction_Vectorcall + 0x75 (0x525775 in /home/test/venv/bin/python)
frame #18: _PyEval_EvalFrameDefault + 0x302b (0x517aab in /home/test/venv/bin/python)
frame #19: _PyFunction_Vectorcall + 0x75 (0x525775 in /home/test/venv/bin/python)
frame #20: _PyEval_EvalFrameDefault + 0x302b (0x517aab in /home/test/venv/bin/python)
frame #21: _PyFunction_Vectorcall + 0x75 (0x525775 in /home/test/venv/bin/python)
frame #22: _PyEval_EvalFrameDefault + 0x734 (0x5151b4 in /home/test/venv/bin/python)
frame #23: _PyFunction_Vectorcall + 0x75 (0x525775 in /home/test/venv/bin/python)
frame #24: _PyEval_EvalFrameDefault + 0x734 (0x5151b4 in /home/test/venv/bin/python)
frame #25: _PyFunction_Vectorcall + 0x75 (0x525775 in /home/test/venv/bin/python)
frame #26: _PyEval_EvalFrameDefault + 0x32d (0x514dad in /home/test/venv/bin/python)
frame #27: _PyFunction_Vectorcall + 0x75 (0x525775 in /home/test/venv/bin/python)
frame #28: _PyEval_EvalFrameDefault + 0x1451 (0x515ed1 in /home/test/venv/bin/python)
frame #29: /home/test/venv/bin/python() [0x5c9dd5]
frame #30: PyEval_EvalCode + 0x80 (0x5c9d30 in /home/test/venv/bin/python)
frame #31: /home/test/venv/bin/python() [0x5fea7c]
frame #32: /home/test/venv/bin/python() [0x5fa616]
frame #33: PyRun_StringFlags + 0x82 (0x5f03a2 in /home/test/venv/bin/python)
frame #34: PyRun_SimpleStringFlags + 0x42 (0x5f01c2 in /home/test/venv/bin/python)
frame #35: Py_RunMain + 0x3c4 (0x5ef6e4 in /home/test/venv/bin/python)
frame #36: Py_BytesMain + 0x2d (0x5bd16d in /home/test/venv/bin/python)
frame #37: + 0x2a1ca (0x7f940473a1ca in /lib/x86_64-linux-gnu/libc.so.6)
frame #38: __libc_start_main + 0x8b (0x7f940473a28b in /lib/x86_64-linux-gnu/libc.so.6)
frame #39: _start + 0x25 (0x5bd065 in /home/test/venv/bin/python)
Traceback (most recent call last):
File "/home/kense/vits/train_ms.py", line 297, in
main()
File "/home/kense/vits/train_ms.py", line 52, in main
mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,))
File "/home/test/venv/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 328, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method="spawn")
File "/home/test/venv/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 284, in start_processes
while not context.join():
File "/home/test/venv/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 184, in join
raise ProcessExitedException(
torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with signal SIGABRT
===================================================
{
"train": {
"log_interval": 200,
"eval_interval": 1000,
"seed": 1234,
"epochs": 10000,
"learning_rate": 2e-4,
"betas": [0.8, 0.99],
"eps": 1e-9,
"batch_size": 32,
"fp16_run": true,
"lr_decay": 0.999875,
"segment_size": 8192,
"init_lr_ratio": 1,
"warmup_epochs": 0,
"c_mel": 45,
"c_kl": 1.0
},
"data": {
"training_files":"filelists/train.txt.cleaned",
"validation_files":"filelists/val.txt.cleaned",
"text_cleaners":["basic_cleaners"],
"max_wav_value": 32768.0,
"sampling_rate": 22050,
"filter_length": 1024,
"hop_length": 256,
"win_length": 1024,
"n_mel_channels": 80,
"mel_fmin": 0.0,
"mel_fmax": null,
"add_blank": true,
"n_speakers": 12,
"cleaned_text": true
},
"model": {
"inter_channels": 192,
"hidden_channels": 192,
"filter_channels": 768,
"n_heads": 2,
"n_layers": 6,
"kernel_size": 3,
"p_dropout": 0.1,
"resblock": "1",
"resblock_kernel_sizes": [3,7,11],
"resblock_dilation_sizes": [[1,3,5], [1,3,5], [1,3,5]],
"upsample_rates": [8,8,2,2],
"upsample_initial_channel": 512,
"upsample_kernel_sizes": [16,16,4,4],
"n_layers_q": 3,
"use_spectral_norm": false,
"gin_channels": 256
}
}
===================================================
python==3.10.16
torch==2.5.1+cu124
If you need more information about my environment or logs, please tell me.
Number of speakers is 13(id:0~12)
I tried some methods proposed in similar situations.
But all of it didn't contribute.
Please help me and thanks for reading my poor English.