qwen3.5的vllm启动的docker #50

lll-programe · 2026-03-04T03:40:56Z

lll-programe
Mar 4, 2026

Package Version

accelerate 1.12.0
aiofile 3.9.0
aiofiles 25.1.0
aiohappyeyeballs 2.6.1
aiohttp 3.13.3
aiosignal 1.4.0
annotated-doc 0.0.4
annotated-types 0.7.0
anthropic 0.84.0
anyio 4.12.1
apache-tvm-ffi 0.1.8.post2
astor 0.8.1
attrs 25.4.0
awscrt 0.31.2
bitsandbytes 0.49.2
blake3 1.0.8
blinker 1.4
boto3 1.42.56
botocore 1.42.56
cachetools 7.0.1
caio 0.9.25
cbor2 5.8.0
certifi 2026.2.25
cffi 2.0.0
charset-normalizer 3.4.4
click 8.3.1
cloudpickle 3.1.2
compressed-tensors 0.13.0
cryptography 46.0.5
cuda-bindings 13.1.1
cuda-pathfinder 1.3.5
cuda-python 13.1.1
cufile-python 0.2.0
cupy-cuda12x 14.0.1
dbus-python 1.2.18
deep_ep 1.2.1+73b6ea4
deep_gemm 2.3.0+477618c
depyf 0.20.0
dill 0.4.1
diskcache 5.6.3
distro 1.9.0
dnspython 2.8.0
docstring_parser 0.17.0
einops 0.8.2
email-validator 2.3.0
fastapi 0.133.0
fastapi-cli 0.0.24
fastapi-cloud-cli 0.13.0
fastar 0.8.0
filelock 3.24.3
flashinfer-cubin 0.6.3
flashinfer-jit-cache 0.6.3+cu129
flashinfer-python 0.6.3
frozenlist 1.8.0
fsspec 2026.2.0
gguf 0.17.1
google-api-core 2.30.0
google-auth 2.48.0
google-cloud-core 2.5.0
google-cloud-storage 3.9.0
google-crc32c 1.8.0
google-resumable-media 2.8.0
googleapis-common-protos 1.72.0
grpcio 1.78.1
grpcio-reflection 1.78.1
h11 0.16.0
hf_transfer 0.1.9
hf-xet 1.3.1
httpcore 1.0.9
httplib2 0.20.2
httptools 0.7.1
httpx 0.28.1
httpx-sse 0.4.3
huggingface_hub 0.36.2
humanize 4.15.0
idna 3.11
ijson 3.5.0
importlib-metadata 4.6.4
interegular 0.3.3
jeepney 0.7.1
Jinja2 3.1.6
jiter 0.13.0
jmespath 1.1.0
jsonschema 4.26.0
jsonschema-specifications 2025.9.1
keyring 23.5.0
lark 1.2.2
launchpadlib 1.10.16
lazr.restfulclient 0.14.4
lazr.uri 1.0.6
llguidance 1.3.0
llvmlite 0.44.0
lm-format-enforcer 0.11.3
lmcache 0.3.14
loguru 0.7.3
markdown-it-py 4.0.0
MarkupSafe 3.0.3
mcp 1.26.0
mdurl 0.1.2
mistral_common 1.9.1
model-hosting-container-standards 0.1.13
modelscope 1.34.0
more-itertools 8.10.0
mpmath 1.3.0
msgpack 1.1.2
msgspec 0.20.0
multidict 6.7.1
networkx 3.6.1
ninja 1.13.0
nixl 0.10.0
nixl-cu12 0.10.0
numba 0.61.2
numpy 2.2.6
nvidia-cublas-cu12 12.9.1.4
nvidia-cuda-cupti-cu12 12.9.79
nvidia-cuda-nvrtc-cu12 12.9.86
nvidia-cuda-runtime-cu12 12.9.79
nvidia-cudnn-cu12 9.10.2.21
nvidia-cudnn-frontend 1.18.0
nvidia-cufft-cu12 11.4.1.4
nvidia-cufile-cu12 1.14.1.1
nvidia-curand-cu12 10.3.10.19
nvidia-cusolver-cu12 11.7.5.82
nvidia-cusparse-cu12 12.5.10.65
nvidia-cusparselt-cu12 0.7.1
nvidia-cutlass-dsl 4.4.0
nvidia-cutlass-dsl-libs-base 4.4.0
nvidia-ml-py 13.590.48
nvidia-nccl-cu12 2.27.5
nvidia-nvjitlink-cu12 12.9.86
nvidia-nvshmem-cu12 3.3.20
nvidia-nvtx-cu12 12.9.79
nvtx 0.2.14
oauthlib 3.2.0
openai 2.24.0
openai-harmony 0.0.8
opencv-python-headless 4.13.0.92
outlines_core 0.2.11
packaging 26.0
partial-json-parser 0.2.1.1.post7
pillow 12.1.1
pip 26.0.1
pplx-kernels 0.0.1
prometheus_client 0.24.1
prometheus-fastapi-instrumentator 7.1.0
propcache 0.4.1
proto-plus 1.27.1
protobuf 4.25.3
psutil 7.2.2
py-cpuinfo 9.0.0
pyasn1 0.6.2
pyasn1_modules 0.4.2
pybase64 1.4.3
pycountry 26.2.16
pycparser 3.0
pydantic 2.12.5
pydantic_core 2.41.5
pydantic-extra-types 2.11.0
pydantic-settings 2.13.1
Pygments 2.19.2
PyGObject 3.42.1
PyJWT 2.11.0
pyparsing 2.4.7
python-apt 2.4.0+ubuntu4.1
python-dateutil 2.9.0.post0
python-dotenv 1.2.1
python-json-logger 4.0.0
python-multipart 0.0.22
PyYAML 6.0.3
pyzmq 27.1.0
ray 2.54.0
redis 7.2.0
referencing 0.37.0
regex 2026.2.19
requests 2.32.5
rich 14.3.3
rich-toolkit 0.19.7
rignore 0.7.6
rpds-py 0.30.0
rsa 4.9.1
runai-model-streamer 0.15.6
runai-model-streamer-gcs 0.15.6
runai-model-streamer-s3 0.15.6
s3transfer 0.16.0
safetensors 0.7.0
SecretStorage 3.3.1
sentencepiece 0.2.1
sentry-sdk 2.53.0
setproctitle 1.3.7
setuptools 80.10.2
setuptools-scm 9.2.2
shellingham 1.5.4
six 1.17.0
sniffio 1.3.1
sortedcontainers 2.4.0
sse-starlette 3.2.0
starlette 0.52.1
supervisor 4.3.0
sympy 1.14.0
tabulate 0.9.0
tiktoken 0.12.0
timm 1.0.25
tokenizers 0.22.2
torch 2.9.1+cu129
torchaudio 2.9.1+cu129
torchvision 0.24.1+cu129
tqdm 4.67.3
transformers 4.56.0
triton 3.5.1
typer 0.24.1
typer-slim 0.24.0
typing_extensions 4.15.0
typing-inspection 0.4.2
urllib3 2.6.3
uv 0.10.6
uvicorn 0.41.0
uvloop 0.22.1
vllm 0.16.0
wadllib 1.3.6
watchfiles 1.1.1
websockets 16.0
wheel 0.37.1
xgrammar 0.1.29
yarl 1.22.0
zipp 1.0.0
root@b69601610e78:/mnt/models/Qwen3___5-35B-A3B# python3 -m vllm.entrypoints.openai.api_server --model /mnt/models/Qwen3___5-35B-A3B/ --tensor-parallel-size 2 --dtype bfloat16 --host 0.0.0.0 --port 8000 --gpu-memory-utilization 0.9 --trust-remote-code --disable-log-requests
(APIServer pid=2152) INFO 03-04 11:36:57 [utils.py:287]
(APIServer pid=2152) INFO 03-04 11:36:57 [utils.py:287] █ █ █▄ ▄█
(APIServer pid=2152) INFO 03-04 11:36:57 [utils.py:287] ▄▄ ▄█ █ █ █ ▀▄▀ █ version 0.16.0
(APIServer pid=2152) INFO 03-04 11:36:57 [utils.py:287] █▄█▀ █ █ █ █ model /mnt/models/Qwen3___5-35B-A3B/
(APIServer pid=2152) INFO 03-04 11:36:57 [utils.py:287] ▀▀ ▀▀▀▀▀ ▀▀▀▀▀ ▀ ▀
(APIServer pid=2152) INFO 03-04 11:36:57 [utils.py:287]
(APIServer pid=2152) INFO 03-04 11:36:57 [utils.py:223] non-default args: {'host': '0.0.0.0', 'model': '/mnt/models/Qwen3___5-35B-A3B/', 'trust_remote_code': True, 'dtype': 'bfloat16', 'tensor_parallel_size': 2}
(APIServer pid=2152) The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=2152) The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=2152) Traceback (most recent call last):
(APIServer pid=2152) File "", line 198, in _run_module_as_main
(APIServer pid=2152) File "", line 88, in _run_code
(APIServer pid=2152) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 531, in
(APIServer pid=2152) uvloop.run(run_server(args))
(APIServer pid=2152) File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 96, in run
(APIServer pid=2152) return __asyncio.run(
(APIServer pid=2152) ^^^^^^^^^^^^^^
(APIServer pid=2152) File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=2152) return runner.run(main)
(APIServer pid=2152) ^^^^^^^^^^^^^^^^
(APIServer pid=2152) File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=2152) return self._loop.run_until_complete(task)
(APIServer pid=2152) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2152) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=2152) File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 48, in wrapper
(APIServer pid=2152) return await main
(APIServer pid=2152) ^^^^^^^^^^
(APIServer pid=2152) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 457, in run_server
(APIServer pid=2152) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=2152) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 476, in run_server_worker
(APIServer pid=2152) async with build_async_engine_client(
(APIServer pid=2152) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2152) File "/usr/lib/python3.12/contextlib.py", line 210, in aenter
(APIServer pid=2152) return await anext(self.gen)
(APIServer pid=2152) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2152) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 96, in build_async_engine_client
(APIServer pid=2152) async with build_async_engine_client_from_engine_args(
(APIServer pid=2152) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2152) File "/usr/lib/python3.12/contextlib.py", line 210, in aenter
(APIServer pid=2152) return await anext(self.gen)
(APIServer pid=2152) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2152) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 122, in build_async_engine_client_from_engine_args
(APIServer pid=2152) vllm_config = engine_args.create_engine_config(usage_context=usage_context)
(APIServer pid=2152) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2152) File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 1410, in create_engine_config
(APIServer pid=2152) model_config = self.create_model_config()
(APIServer pid=2152) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2152) File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 1264, in create_model_config
(APIServer pid=2152) return ModelConfig(
(APIServer pid=2152) ^^^^^^^^^^^^
(APIServer pid=2152) File "/usr/local/lib/python3.12/dist-packages/pydantic/_internal/_dataclasses.py", line 121, in init
(APIServer pid=2152) s.pydantic_validator.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
(APIServer pid=2152) pydantic_core._pydantic_core.ValidationError: 1 validation error for ModelConfig
(APIServer pid=2152) Value error, The checkpoint you are trying to load has model type qwen3_5_moe but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.
(APIServer pid=2152)
(APIServer pid=2152) You can update Transformers with the command pip install --upgrade transformers. If this does not work, and the checkpoint is very new, then there may not be a release version that supports this model yet. In this case, you can get the most up-to-date code by installing Transformers from source with the command pip install git+https://github.com/huggingface/transformers.git [type=value_error, input_value=ArgsKwargs((), {'model': ...rocessor_plugin': None}), input_type=ArgsKwargs]
报错，显示transformer不能识别qwen3.5的架构，我试了5.2.0的版本是可以识别的，但是vllm不支持，本人使用transformer加fastapi的方式启动服务速度特别慢，有无大佬支招

Bytes-Lin · 2026-03-05T02:38:40Z

Bytes-Lin
Mar 5, 2026

你transformers版本低了啊，应该是5.2.0，装vllm的时候会给你降低transformers版本，你自己重装一下

2 replies

lll-programe Mar 5, 2026
Author

这个版本你升级transformer到5.2会显示和vllm0.16冲突，我后面找了qwen3.5的官方镜像跑成功了

Noahwwang Mar 12, 2026

这个版本你升级transformer到5.2会显示和vllm0.16冲突，我后面找了qwen3.5的官方镜像跑成功了

辛苦大佬回复下你的官方镜像是推理镜像还是训练镜像，有地址吗？

JJJYmmm · 2026-03-05T14:03:55Z

JJJYmmm
Mar 5, 2026

可以先使用vllm nightly版本，现在vllm和hf版本有些冲突，nightly版本有个pr fix了. 之后等vllm 0.17.0发布吧

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

qwen3.5的vllm启动的docker #50

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

qwen3.5的vllm启动的docker #50

Uh oh!

lll-programe Mar 4, 2026

Replies: 2 comments · 2 replies

Uh oh!

Bytes-Lin Mar 5, 2026

Uh oh!

lll-programe Mar 5, 2026 Author

Uh oh!

Noahwwang Mar 12, 2026

Uh oh!

JJJYmmm Mar 5, 2026

lll-programe
Mar 4, 2026

Replies: 2 comments 2 replies

Bytes-Lin
Mar 5, 2026

lll-programe Mar 5, 2026
Author

JJJYmmm
Mar 5, 2026