The following error happens ONLY when the server is launched on multiple GPUs, and i pass a ref_audio for cloning. On 1 GPU the inference is OK.
Also, when i dont pass any ref_audio, (using default voice), the inference is OK for any number of GPUs.
(APIServer pid=1) INFO: 172.26.161.142:34996 - "POST /v1/audio/speech HTTP/1.1" 200 OK
(APIServer pid=1) INFO 04-08 18:02:44 [orchestrator.py:584] [Orchestrator] _handle_add_request: stage=0 req=speech-954d5cc527259e76 prompt_type=OmniEngineCoreRequest original_prompt_type=dict final_stage=1 num_sampling_params=2
(APIServer pid=1) INFO 04-08 18:02:44 [stage_engine_core_client.py:113] [StageEngineCoreClient] Stage-0 adding request: speech-954d5cc527259e76
(APIServer pid=1) INFO 04-08 18:02:44 [stage_engine_core_client.py:113] [StageEngineCoreClient] Stage-1 adding request: speech-954d5cc527259e76
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] WorkerProc hit an exception.
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] Traceback (most recent call last):
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 927, in worker_busy_loop
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] output = func(*args, **kwargs)
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/worker_base.py", line 332, in execute_model
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] return self.worker.execute_model(scheduler_output)
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] return func(*args, **kwargs)
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 822, in execute_model
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] output = self.model_runner.execute_model(
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] return func(*args, **kwargs)
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/vllm_omni/worker/gpu_ar_model_runner.py", line 267, in execute_model
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ) = self._preprocess(scheduler_output, num_tokens_padded, intermediate_tensors)
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/vllm_omni/worker/gpu_model_runner.py", line 1225, in _preprocess
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] req_input_ids, req_embeds, update_dict = self.model.preprocess(
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/vllm_omni/model_executor/models/fish_speech/fish_speech_slow_ar.py", line 368, in preprocess
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] prompt_embeds = self._build_structured_voice_clone_prefill_embeds(info_dict)
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/vllm_omni/model_executor/models/fish_speech/fish_speech_slow_ar.py", line 530, in _build_structured_voice_clone_prefill_embeds
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ref_audio_wav = np.load(ref_audio_path)
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/numpy/lib/npyio.py", line 427, in load
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] fid = stack.enter_context(open(os_fspath(file), "rb"))
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] FileNotFoundError: [Errno 2] No such file or directory: '/tmp/fish_ref_vwbsa8u2.npy'
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] Traceback (most recent call last):
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 927, in worker_busy_loop
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] output = func(*args, **kwargs)
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/worker_base.py", line 332, in execute_model
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] return self.worker.execute_model(scheduler_output)
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] return func(*args, **kwargs)
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 822, in execute_model
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] output = self.model_runner.execute_model(
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] return func(*args, **kwargs)
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/vllm_omni/worker/gpu_ar_model_runner.py", line 267, in execute_model
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ) = self._preprocess(scheduler_output, num_tokens_padded, intermediate_tensors)
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/vllm_omni/worker/gpu_model_runner.py", line 1225, in _preprocess
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] req_input_ids, req_embeds, update_dict = self.model.preprocess(
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/vllm_omni/model_executor/models/fish_speech/fish_speech_slow_ar.py", line 368, in preprocess
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] prompt_embeds = self._build_structured_voice_clone_prefill_embeds(info_dict)
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/vllm_omni/model_executor/models/fish_speech/fish_speech_slow_ar.py", line 530, in _build_structured_voice_clone_prefill_embeds
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ref_audio_wav = np.load(ref_audio_path)
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/numpy/lib/npyio.py", line 427, in load
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] fid = stack.enter_context(open(os_fspath(file), "rb"))
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932] FileNotFoundError: [Errno 2] No such file or directory: '/tmp/fish_ref_vwbsa8u2.npy'
(Worker_TP3 pid=830) ERROR 04-08 18:02:45 [multiproc_executor.py:932]
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] WorkerProc hit an exception.
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] Traceback (most recent call last):
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 927, in worker_busy_loop
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] output = func(*args, **kwargs)
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/worker_base.py", line 332, in execute_model
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] return self.worker.execute_model(scheduler_output)
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] return func(*args, **kwargs)
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 822, in execute_model
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] output = self.model_runner.execute_model(
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] return func(*args, **kwargs)
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/vllm_omni/worker/gpu_ar_model_runner.py", line 267, in execute_model
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ) = self._preprocess(scheduler_output, num_tokens_padded, intermediate_tensors)
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/vllm_omni/worker/gpu_model_runner.py", line 1225, in _preprocess
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] req_input_ids, req_embeds, update_dict = self.model.preprocess(
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/vllm_omni/model_executor/models/fish_speech/fish_speech_slow_ar.py", line 368, in preprocess
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] prompt_embeds = self._build_structured_voice_clone_prefill_embeds(info_dict)
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/vllm_omni/model_executor/models/fish_speech/fish_speech_slow_ar.py", line 530, in _build_structured_voice_clone_prefill_embeds
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ref_audio_wav = np.load(ref_audio_path)
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/numpy/lib/npyio.py", line 427, in load
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] fid = stack.enter_context(open(os_fspath(file), "rb"))
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] FileNotFoundError: [Errno 2] No such file or directory: '/tmp/fish_ref_vwbsa8u2.npy'
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] Traceback (most recent call last):
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 927, in worker_busy_loop
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] output = func(*args, **kwargs)
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/worker_base.py", line 332, in execute_model
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] return self.worker.execute_model(scheduler_output)
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] return func(*args, **kwargs)
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 822, in execute_model
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] output = self.model_runner.execute_model(
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] return func(*args, **kwargs)
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/vllm_omni/worker/gpu_ar_model_runner.py", line 267, in execute_model
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ) = self._preprocess(scheduler_output, num_tokens_padded, intermediate_tensors)
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/vllm_omni/worker/gpu_model_runner.py", line 1225, in _preprocess
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] req_input_ids, req_embeds, update_dict = self.model.preprocess(
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/vllm_omni/model_executor/models/fish_speech/fish_speech_slow_ar.py", line 368, in preprocess
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] prompt_embeds = self._build_structured_voice_clone_prefill_embeds(info_dict)
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/vllm_omni/model_executor/models/fish_speech/fish_speech_slow_ar.py", line 530, in _build_structured_voice_clone_prefill_embeds
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ref_audio_wav = np.load(ref_audio_path)
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/numpy/lib/npyio.py", line 427, in load
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] fid = stack.enter_context(open(os_fspath(file), "rb"))
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932] FileNotFoundError: [Errno 2] No such file or directory: '/tmp/fish_ref_vwbsa8u2.npy'
(Worker_TP2 pid=829) ERROR 04-08 18:02:45 [multiproc_executor.py:932]
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] WorkerProc hit an exception.
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] Traceback (most recent call last):
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 927, in worker_busy_loop
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] output = func(*args, **kwargs)
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/worker_base.py", line 332, in execute_model
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] return self.worker.execute_model(scheduler_output)
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] return func(*args, **kwargs)
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 822, in execute_model
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] output = self.model_runner.execute_model(
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] return func(*args, **kwargs)
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/vllm_omni/worker/gpu_ar_model_runner.py", line 267, in execute_model
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ) = self._preprocess(scheduler_output, num_tokens_padded, intermediate_tensors)
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/vllm_omni/worker/gpu_model_runner.py", line 1225, in _preprocess
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] req_input_ids, req_embeds, update_dict = self.model.preprocess(
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/vllm_omni/model_executor/models/fish_speech/fish_speech_slow_ar.py", line 368, in preprocess
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] prompt_embeds = self._build_structured_voice_clone_prefill_embeds(info_dict)
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/vllm_omni/model_executor/models/fish_speech/fish_speech_slow_ar.py", line 530, in _build_structured_voice_clone_prefill_embeds
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ref_audio_wav = np.load(ref_audio_path)
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/numpy/lib/npyio.py", line 427, in load
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] fid = stack.enter_context(open(os_fspath(file), "rb"))
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] FileNotFoundError: [Errno 2] No such file or directory: '/tmp/fish_ref_vwbsa8u2.npy'
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] Traceback (most recent call last):
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 927, in worker_busy_loop
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] output = func(*args, **kwargs)
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/worker_base.py", line 332, in execute_model
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] return self.worker.execute_model(scheduler_output)
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] return func(*args, **kwargs)
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 822, in execute_model
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] output = self.model_runner.execute_model(
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] return func(*args, **kwargs)
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/vllm_omni/worker/gpu_ar_model_runner.py", line 267, in execute_model
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ) = self._preprocess(scheduler_output, num_tokens_padded, intermediate_tensors)
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/vllm_omni/worker/gpu_model_runner.py", line 1225, in _preprocess
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] req_input_ids, req_embeds, update_dict = self.model.preprocess(
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/vllm_omni/model_executor/models/fish_speech/fish_speech_slow_ar.py", line 368, in preprocess
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] prompt_embeds = self._build_structured_voice_clone_prefill_embeds(info_dict)
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/vllm_omni/model_executor/models/fish_speech/fish_speech_slow_ar.py", line 530, in _build_structured_voice_clone_prefill_embeds
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ref_audio_wav = np.load(ref_audio_path)
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] File "/usr/local/lib/python3.12/dist-packages/numpy/lib/npyio.py", line 427, in load
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] fid = stack.enter_context(open(os_fspath(file), "rb"))
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932] FileNotFoundError: [Errno 2] No such file or directory: '/tmp/fish_ref_vwbsa8u2.npy'
(Worker_TP1 pid=828) ERROR 04-08 18:02:45 [multiproc_executor.py:932]
(Worker_TP0 pid=827) INFO 04-08 18:02:45 [dac_encoder.py:145] Encoded reference audio: 257533 samples @ 24000Hz -> 126 semantic tokens
Your current environment
The output of
python collect_env.pyYour code version
The commit id or version of vllm
The commit id or version of vllm-omni
🐛 Describe the bug
The following error happens ONLY when the server is launched on multiple GPUs, and i pass a
ref_audiofor cloning. On 1 GPU the inference is OK.Also, when i dont pass any
ref_audio, (using default voice), the inference is OK for any number of GPUs.Before submitting a new issue...