Train finishes with error: WARNING: destroy_process_group() was not called before program exit, and: RuntimeError: can't create new thread at interpreter shutdown

Notice: In order to resolve issues more efficiently, please raise issue following the template.
（注意：为了更加高效率解决您遇到的问题，请按照模板提问，补充细节）

## 🐛 Bug
Everytime when the trainning was finished，I can saw the same error in the log file，seemed that the clean work encountered problems:

`
average_checkpoints: ['./outputs2/model.pt.ep1', './outputs2/model.pt.ep2', './outputs2/model.pt.ep3', './outputs2/model.pt.ep4', './outputs2/model.pt.ep5', './outputs2/model.pt.ep6', './outputs2/model.pt.ep7', './outputs2/model.pt.ep8', './outputs2/model.pt.ep9', './outputs2/model.pt.ep10', './outputs2/model.pt.ep11', './outputs2/model.pt.ep12', './outputs2/model.pt.ep13', './outputs2/model.pt.ep14', './outputs2/model.pt.ep15', './outputs2/model.pt.ep16', './outputs2/model.pt.ep17', './outputs2/model.pt.ep18', './outputs2/model.pt.ep19', './outputs2/model.pt.ep20']
Checkpoint file ./outputs2/model.pt.ep1 not found.
Checkpoint file ./outputs2/model.pt.ep2 not found.
Checkpoint file ./outputs2/model.pt.ep3 not found.
Checkpoint file ./outputs2/model.pt.ep4 not found.
Checkpoint file ./outputs2/model.pt.ep5 not found.
Checkpoint file ./outputs2/model.pt.ep6 not found.
Checkpoint file ./outputs2/model.pt.ep7 not found.
Checkpoint file ./outputs2/model.pt.ep8 not found.
Checkpoint file ./outputs2/model.pt.ep9 not found.
Checkpoint file ./outputs2/model.pt.ep10 not found.
Checkpoint file ./outputs2/model.pt.ep11 not found.
Checkpoint file ./outputs2/model.pt.ep12 not found.
Checkpoint file ./outputs2/model.pt.ep13 not found.
Checkpoint file ./outputs2/model.pt.ep14 not found.
Checkpoint file ./outputs2/model.pt.ep15 not found.
Checkpoint file ./outputs2/model.pt.ep16 not found.
Checkpoint file ./outputs2/model.pt.ep17 not found.
Checkpoint file ./outputs2/model.pt.ep18 not found.
Checkpoint file ./outputs2/model.pt.ep19 not found.
[W106 21:41:32.566797026 ProcessGroupNCCL.cpp:1524] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
Exception ignored in atexit callback: <function FileWriter.__init__.<locals>.cleanup at 0x7f8065a77060>
Traceback (most recent call last):
  File "/www/ai/ocr/asr/FunASR/myFunASR/lib/python3.12/site-packages/tensorboardX/writer.py", line 122, in cleanup
    self.event_writer.close()
  File "/www/ai/ocr/asr/FunASR/myFunASR/lib/python3.12/site-packages/tensorboardX/event_file_writer.py", line 154, in close
    self._worker.stop()
  File "/www/ai/ocr/asr/FunASR/myFunASR/lib/python3.12/site-packages/tensorboardX/event_file_writer.py", line 185, in stop
    self._queue.put(self._shutdown_signal)
  File "/root/.pyenv/versions/3.12.0/lib/python3.12/multiprocessing/queues.py", line 94, in put
    self._start_thread()
  File "/root/.pyenv/versions/3.12.0/lib/python3.12/multiprocessing/queues.py", line 177, in _start_thread
    self._thread.start()
  File "/root/.pyenv/versions/3.12.0/lib/python3.12/threading.py", line 971, in start
    _start_new_thread(self._bootstrap, ())
RuntimeError: can't create new thread at interpreter shutdown
Exception ignored in atexit callback: <function FileWriter.__init__.<locals>.cleanup at 0x7f03cec9af20>
Traceback (most recent call last):
  File "/www/ai/ocr/asr/FunASR/myFunASR/lib/python3.12/site-packages/tensorboardX/writer.py", line 122, in cleanup
    self.event_writer.close()
  File "/www/ai/ocr/asr/FunASR/myFunASR/lib/python3.12/site-packages/tensorboardX/event_file_writer.py", line 154, in close
    self._worker.stop()
  File "/www/ai/ocr/asr/FunASR/myFunASR/lib/python3.12/site-packages/tensorboardX/event_file_writer.py", line 185, in stop
    self._queue.put(self._shutdown_signal)
  File "/root/.pyenv/versions/3.12.0/lib/python3.12/multiprocessing/queues.py", line 94, in put
    self._start_thread()
  File "/root/.pyenv/versions/3.12.0/lib/python3.12/multiprocessing/queues.py", line 177, in _start_thread
    self._thread.start()
  File "/root/.pyenv/versions/3.12.0/lib/python3.12/threading.py", line 971, in start
    _start_new_thread(self._bootstrap, ())
RuntimeError: can't create new thread at interpreter shutdown
`


### To Reproduce

Steps to reproduce the behavior (**always include the command you ran**):

1. Run cmd 
`funasr-train ++model=paraformer-zh ++train_data_set_list=train.jsonl ++valid_data_set_list=test.jsonl ++output_dir="./outputs" &> log.txt &`
3. See error

#### Code sample
no code , just one cmd.

### Expected behavior

Train finishes with no error or warning.

### Environment

 - OS (e.g., Linux): CentOS Linux release 8.5.2111
 - FunASR Version (e.g., 1.0.0): 1.2.9
 - ModelScope Version (e.g., 1.11.0): 1.33.0
 - PyTorch Version (e.g., 2.0.0): 2.9.1
 - How you installed funasr (`pip`, source): pip
 - Python version: 3.11
 - GPU (e.g., V100M32) : nvidia 4090
 - CUDA/cuDNN version (e.g., cuda11.7): 12.8
 - Docker version (e.g., funasr-runtime-sdk-cpu-0.4.1): no
 - Any other relevant information:none

### Additional context
none

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Train finishes with error: WARNING: destroy_process_group() was not called before program exit, and: RuntimeError: can't create new thread at interpreter shutdown #2774

🐛 Bug

To Reproduce

Code sample

Expected behavior

Environment

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Train finishes with error: WARNING: destroy_process_group() was not called before program exit, and: RuntimeError: can't create new thread at interpreter shutdown #2774

Description

🐛 Bug

To Reproduce

Code sample

Expected behavior

Environment

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions