Skip to content

Commit cfed4b9

Browse files
authored
[v0.9.1][Build][Ray] Fix protobuf version in Dockerfile (#2028) (#2306)
### What this PR does / why we need it? Fix protobuf version in Dockerfile to resolve `AttributeError: 'str' object has no attribute 'DESCRIPTOR' when packaging message to dict` using protobuf. will remove version specification after ray-project/ray#54910 is merged backport of #2028 ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed with new added test. --------- Signed-off-by: MengqingCao <[email protected]>
1 parent 13fc844 commit cfed4b9

File tree

5 files changed

+45
-7
lines changed

5 files changed

+45
-7
lines changed

Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ RUN source /usr/local/Ascend/ascend-toolkit/set_env.sh && \
5353
python3 -m pip cache purge
5454

5555
# Install modelscope (for fast download) and ray (for multinode)
56-
RUN python3 -m pip install modelscope ray && \
56+
RUN python3 -m pip install modelscope 'ray>=2.47.1' 'protobuf>3.20.0' && \
5757
python3 -m pip cache purge
5858

5959
CMD ["/bin/bash"]

Dockerfile.openEuler

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ RUN source /usr/local/Ascend/ascend-toolkit/set_env.sh && \
5050
python3 -m pip cache purge
5151

5252
# Install modelscope (for fast download) and ray (for multinode)
53-
RUN python3 -m pip install modelscope ray && \
53+
RUN python3 -m pip install modelscope 'ray>=2.47.1' 'protobuf>3.20.0' && \
5454
python3 -m pip cache purge
5555

5656
CMD ["/bin/bash"]

docs/source/faqs.md

Lines changed: 21 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -158,12 +158,29 @@ for output in outputs:
158158
2. Set the following enveriments parameters:
159159

160160
```bash
161-
export LCCL_DETERMINISTIC = 1
162-
export HCCL_DETERMINISTIC = 1
163-
export ATB_MATMUL_SHUFFLE_K_ENABLE = 0
164-
export ATB_LLM_LCOC_ENABLE = 0
161+
export LCCL_DETERMINISTIC=1
162+
export HCCL_DETERMINISTIC=true
163+
export ATB_MATMUL_SHUFFLE_K_ENABLE=0
164+
export ATB_LLM_LCOC_ENABLE=0
165165
```
166166

167167
### 19. How to fix the error "ImportError: Please install vllm[audio] for audio support" for Qwen2.5-Omni model?
168168
The `Qwen2.5-Omni` model requires the `librosa` package to be installed, you need to install the `qwen-omni-utils` package to ensure all dependencies are met `pip install qwen-omni-utils`,
169169
this package will install `librosa` and its related dependencies, resolving the `ImportError: No module named 'librosa'` issue and ensuring audio processing functionality works correctly.
170+
171+
### 20. Failed to run with `ray` distributed backend?
172+
You might facing the following errors when running with ray backend in distributed scenarios:
173+
174+
```
175+
TypeError: can't convert npu:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
176+
```
177+
178+
```
179+
AttributeError: 'str' object has no attribute 'DESCRIPTOR' when packaging message to dict
180+
```
181+
182+
This has been solved in `ray>=2.47.1`, thus we could solve this as following:
183+
184+
```
185+
python3 -m pip install modelscope 'ray>=2.47.1' 'protobuf>3.20.0'
186+
```

requirements-dev.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,3 +11,5 @@ xgrammar
1111
zmq
1212
types-psutil
1313
networkx
14+
ray>=2.47.1
15+
protobuf>3.20.0

tests/multicard/test_offline_inference_distributed.py

Lines changed: 20 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,8 @@
3232

3333
os.environ["PYTORCH_NPU_ALLOC_CONF"] = "max_split_size_mb:256"
3434

35+
DIST_EXECUTOR_BACKEND = ["mp", "ray"]
36+
3537

3638
def test_models_distributed_QwQ():
3739
example_prompts = [
@@ -63,6 +65,23 @@ def test_models_distributed_DeepSeek():
6365
vllm_model.generate_greedy(example_prompts, max_tokens)
6466

6567

68+
@pytest.mark.skipif(os.environ["VLLM_USE_V1"] == "1")
69+
@pytest.mark.parametrize("distributed_executor_backend", DIST_EXECUTOR_BACKEND)
70+
def test_v0_pp(distributed_executor_backend):
71+
example_prompts = [
72+
"Hello, my name is",
73+
]
74+
dtype = "half"
75+
max_tokens = 5
76+
with VllmRunner(
77+
"Qwen/Qwen3-0.6B-Base",
78+
dtype=dtype,
79+
pipeline_parallel_size=2,
80+
distributed_executor_backend=distributed_executor_backend,
81+
) as vllm_model:
82+
vllm_model.generate_greedy(example_prompts, max_tokens)
83+
84+
6685
@patch.dict(os.environ, {"VLLM_ASCEND_ENABLE_TOPK_OPTIMIZE": "1"})
6786
def test_models_distributed_topk() -> None:
6887
example_prompts = [
@@ -227,4 +246,4 @@ def test_models_distributed_Qwen3_with_flashcomm_v2():
227246
dtype="auto",
228247
tensor_parallel_size=2,
229248
) as vllm_model:
230-
vllm_model.generate_greedy(example_prompts, max_tokens)
249+
vllm_model.generate_greedy(example_prompts, max_tokens)

0 commit comments

Comments
 (0)