Skip to content

Commit 40c2c05

Browse files
[v0.9.1][Doc] Update FAQ (#2490)
### What this PR does / why we need it? Update FAQ. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed --------- Signed-off-by: Shanshan Shen <[email protected]>
1 parent c223200 commit 40c2c05

File tree

1 file changed

+30
-1
lines changed

1 file changed

+30
-1
lines changed

docs/source/faqs.md

Lines changed: 30 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -183,4 +183,33 @@ This has been solved in `ray>=2.47.1`, thus we could solve this as following:
183183

184184
```
185185
python3 -m pip install modelscope 'ray>=2.47.1' 'protobuf>3.20.0'
186-
```
186+
```
187+
188+
### 21. Failed with inferencing Qwen3 MoE due to `Alloc sq cq fail` issue?
189+
190+
When running Qwen3 MoE with tp/dp/ep, etc., you may encounter an error shown in [#2629](https://github.com/vllm-project/vllm-ascend/issues/2629).
191+
192+
This is more likely to happen when you're using A3. Please refer to the empirical formula below to estimate a suitable value for this argument:
193+
194+
```python
195+
# pg_num: the number of process groups for communication
196+
pg_num = sum(size > 1 for size in [
197+
parallel_config.data_parallel_size,
198+
parallel_config.tensor_parallel_size,
199+
])
200+
# num_hidden_layer: number of hidden layers of the model
201+
202+
# for A2:
203+
num_capture_sizes = (1920) / (num_hidden_layer + 1) / (1 + pg_num * 1)
204+
# for A3:
205+
num_capture_sizes = (1920 - pg_num * 40) / (num_hidden_layer + 1) / (1 + pg_num * 2)
206+
```
207+
208+
Find more details about how to calculate this value at [#2629](https://github.com/vllm-project/vllm-ascend/issues/2629).
209+
210+
Try to adjust the arg `cuda-capture-sizes` to address this:
211+
212+
```bash
213+
vllm serve ... \
214+
--cuda-capture-sizes=num_capture_sizes
215+
```

0 commit comments

Comments
 (0)