qwen3.5的vllm启动的docker #50
lll-programe
started this conversation in
General
Replies: 2 comments 2 replies
-
|
你transformers版本低了啊,应该是5.2.0,装vllm的时候会给你降低transformers版本,你自己重装一下 |
Beta Was this translation helpful? Give feedback.
2 replies
-
|
可以先使用vllm nightly版本,现在vllm和hf版本有些冲突,nightly版本有个pr fix了. 之后等vllm 0.17.0发布吧 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Package Version
accelerate 1.12.0
aiofile 3.9.0
aiofiles 25.1.0
aiohappyeyeballs 2.6.1
aiohttp 3.13.3
aiosignal 1.4.0
annotated-doc 0.0.4
annotated-types 0.7.0
anthropic 0.84.0
anyio 4.12.1
apache-tvm-ffi 0.1.8.post2
astor 0.8.1
attrs 25.4.0
awscrt 0.31.2
bitsandbytes 0.49.2
blake3 1.0.8
blinker 1.4
boto3 1.42.56
botocore 1.42.56
cachetools 7.0.1
caio 0.9.25
cbor2 5.8.0
certifi 2026.2.25
cffi 2.0.0
charset-normalizer 3.4.4
click 8.3.1
cloudpickle 3.1.2
compressed-tensors 0.13.0
cryptography 46.0.5
cuda-bindings 13.1.1
cuda-pathfinder 1.3.5
cuda-python 13.1.1
cufile-python 0.2.0
cupy-cuda12x 14.0.1
dbus-python 1.2.18
deep_ep 1.2.1+73b6ea4
deep_gemm 2.3.0+477618c
depyf 0.20.0
dill 0.4.1
diskcache 5.6.3
distro 1.9.0
dnspython 2.8.0
docstring_parser 0.17.0
einops 0.8.2
email-validator 2.3.0
fastapi 0.133.0
fastapi-cli 0.0.24
fastapi-cloud-cli 0.13.0
fastar 0.8.0
filelock 3.24.3
flashinfer-cubin 0.6.3
flashinfer-jit-cache 0.6.3+cu129
flashinfer-python 0.6.3
frozenlist 1.8.0
fsspec 2026.2.0
gguf 0.17.1
google-api-core 2.30.0
google-auth 2.48.0
google-cloud-core 2.5.0
google-cloud-storage 3.9.0
google-crc32c 1.8.0
google-resumable-media 2.8.0
googleapis-common-protos 1.72.0
grpcio 1.78.1
grpcio-reflection 1.78.1
h11 0.16.0
hf_transfer 0.1.9
hf-xet 1.3.1
httpcore 1.0.9
httplib2 0.20.2
httptools 0.7.1
httpx 0.28.1
httpx-sse 0.4.3
huggingface_hub 0.36.2
humanize 4.15.0
idna 3.11
ijson 3.5.0
importlib-metadata 4.6.4
interegular 0.3.3
jeepney 0.7.1
Jinja2 3.1.6
jiter 0.13.0
jmespath 1.1.0
jsonschema 4.26.0
jsonschema-specifications 2025.9.1
keyring 23.5.0
lark 1.2.2
launchpadlib 1.10.16
lazr.restfulclient 0.14.4
lazr.uri 1.0.6
llguidance 1.3.0
llvmlite 0.44.0
lm-format-enforcer 0.11.3
lmcache 0.3.14
loguru 0.7.3
markdown-it-py 4.0.0
MarkupSafe 3.0.3
mcp 1.26.0
mdurl 0.1.2
mistral_common 1.9.1
model-hosting-container-standards 0.1.13
modelscope 1.34.0
more-itertools 8.10.0
mpmath 1.3.0
msgpack 1.1.2
msgspec 0.20.0
multidict 6.7.1
networkx 3.6.1
ninja 1.13.0
nixl 0.10.0
nixl-cu12 0.10.0
numba 0.61.2
numpy 2.2.6
nvidia-cublas-cu12 12.9.1.4
nvidia-cuda-cupti-cu12 12.9.79
nvidia-cuda-nvrtc-cu12 12.9.86
nvidia-cuda-runtime-cu12 12.9.79
nvidia-cudnn-cu12 9.10.2.21
nvidia-cudnn-frontend 1.18.0
nvidia-cufft-cu12 11.4.1.4
nvidia-cufile-cu12 1.14.1.1
nvidia-curand-cu12 10.3.10.19
nvidia-cusolver-cu12 11.7.5.82
nvidia-cusparse-cu12 12.5.10.65
nvidia-cusparselt-cu12 0.7.1
nvidia-cutlass-dsl 4.4.0
nvidia-cutlass-dsl-libs-base 4.4.0
nvidia-ml-py 13.590.48
nvidia-nccl-cu12 2.27.5
nvidia-nvjitlink-cu12 12.9.86
nvidia-nvshmem-cu12 3.3.20
nvidia-nvtx-cu12 12.9.79
nvtx 0.2.14
oauthlib 3.2.0
openai 2.24.0
openai-harmony 0.0.8
opencv-python-headless 4.13.0.92
outlines_core 0.2.11
packaging 26.0
partial-json-parser 0.2.1.1.post7
pillow 12.1.1
pip 26.0.1
pplx-kernels 0.0.1
prometheus_client 0.24.1
prometheus-fastapi-instrumentator 7.1.0
propcache 0.4.1
proto-plus 1.27.1
protobuf 4.25.3
psutil 7.2.2
py-cpuinfo 9.0.0
pyasn1 0.6.2
pyasn1_modules 0.4.2
pybase64 1.4.3
pycountry 26.2.16
pycparser 3.0
pydantic 2.12.5
pydantic_core 2.41.5
pydantic-extra-types 2.11.0
pydantic-settings 2.13.1
Pygments 2.19.2
PyGObject 3.42.1
PyJWT 2.11.0
pyparsing 2.4.7
python-apt 2.4.0+ubuntu4.1
python-dateutil 2.9.0.post0
python-dotenv 1.2.1
python-json-logger 4.0.0
python-multipart 0.0.22
PyYAML 6.0.3
pyzmq 27.1.0
ray 2.54.0
redis 7.2.0
referencing 0.37.0
regex 2026.2.19
requests 2.32.5
rich 14.3.3
rich-toolkit 0.19.7
rignore 0.7.6
rpds-py 0.30.0
rsa 4.9.1
runai-model-streamer 0.15.6
runai-model-streamer-gcs 0.15.6
runai-model-streamer-s3 0.15.6
s3transfer 0.16.0
safetensors 0.7.0
SecretStorage 3.3.1
sentencepiece 0.2.1
sentry-sdk 2.53.0
setproctitle 1.3.7
setuptools 80.10.2
setuptools-scm 9.2.2
shellingham 1.5.4
six 1.17.0
sniffio 1.3.1
sortedcontainers 2.4.0
sse-starlette 3.2.0
starlette 0.52.1
supervisor 4.3.0
sympy 1.14.0
tabulate 0.9.0
tiktoken 0.12.0
timm 1.0.25
tokenizers 0.22.2
torch 2.9.1+cu129
torchaudio 2.9.1+cu129
torchvision 0.24.1+cu129
tqdm 4.67.3
transformers 4.56.0
triton 3.5.1
typer 0.24.1
typer-slim 0.24.0
typing_extensions 4.15.0
typing-inspection 0.4.2
urllib3 2.6.3
uv 0.10.6
uvicorn 0.41.0
uvloop 0.22.1
vllm 0.16.0
wadllib 1.3.6
watchfiles 1.1.1
websockets 16.0
wheel 0.37.1
xgrammar 0.1.29
yarl 1.22.0
zipp 1.0.0
root@b69601610e78:/mnt/models/Qwen3___5-35B-A3B# python3 -m vllm.entrypoints.openai.api_server --model /mnt/models/Qwen3___5-35B-A3B/ --tensor-parallel-size 2 --dtype bfloat16 --host 0.0.0.0 --port 8000 --gpu-memory-utilization 0.9 --trust-remote-code --disable-log-requests
(APIServer pid=2152) INFO 03-04 11:36:57 [utils.py:287]
(APIServer pid=2152) INFO 03-04 11:36:57 [utils.py:287] █ █ █▄ ▄█
(APIServer pid=2152) INFO 03-04 11:36:57 [utils.py:287] ▄▄ ▄█ █ █ █ ▀▄▀ █ version 0.16.0
(APIServer pid=2152) INFO 03-04 11:36:57 [utils.py:287] █▄█▀ █ █ █ █ model /mnt/models/Qwen3___5-35B-A3B/
(APIServer pid=2152) INFO 03-04 11:36:57 [utils.py:287] ▀▀ ▀▀▀▀▀ ▀▀▀▀▀ ▀ ▀
(APIServer pid=2152) INFO 03-04 11:36:57 [utils.py:287]
(APIServer pid=2152) INFO 03-04 11:36:57 [utils.py:223] non-default args: {'host': '0.0.0.0', 'model': '/mnt/models/Qwen3___5-35B-A3B/', 'trust_remote_code': True, 'dtype': 'bfloat16', 'tensor_parallel_size': 2}
(APIServer pid=2152) The argument
trust_remote_codeis to be used with Auto classes. It has no effect here and is ignored.(APIServer pid=2152) The argument
trust_remote_codeis to be used with Auto classes. It has no effect here and is ignored.(APIServer pid=2152) Traceback (most recent call last):
(APIServer pid=2152) File "", line 198, in _run_module_as_main
(APIServer pid=2152) File "", line 88, in _run_code
(APIServer pid=2152) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 531, in
(APIServer pid=2152) uvloop.run(run_server(args))
(APIServer pid=2152) File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 96, in run
(APIServer pid=2152) return __asyncio.run(
(APIServer pid=2152) ^^^^^^^^^^^^^^
(APIServer pid=2152) File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=2152) return runner.run(main)
(APIServer pid=2152) ^^^^^^^^^^^^^^^^
(APIServer pid=2152) File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=2152) return self._loop.run_until_complete(task)
(APIServer pid=2152) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2152) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=2152) File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 48, in wrapper
(APIServer pid=2152) return await main
(APIServer pid=2152) ^^^^^^^^^^
(APIServer pid=2152) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 457, in run_server
(APIServer pid=2152) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=2152) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 476, in run_server_worker
(APIServer pid=2152) async with build_async_engine_client(
(APIServer pid=2152) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2152) File "/usr/lib/python3.12/contextlib.py", line 210, in aenter
(APIServer pid=2152) return await anext(self.gen)
(APIServer pid=2152) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2152) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 96, in build_async_engine_client
(APIServer pid=2152) async with build_async_engine_client_from_engine_args(
(APIServer pid=2152) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2152) File "/usr/lib/python3.12/contextlib.py", line 210, in aenter
(APIServer pid=2152) return await anext(self.gen)
(APIServer pid=2152) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2152) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 122, in build_async_engine_client_from_engine_args
(APIServer pid=2152) vllm_config = engine_args.create_engine_config(usage_context=usage_context)
(APIServer pid=2152) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2152) File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 1410, in create_engine_config
(APIServer pid=2152) model_config = self.create_model_config()
(APIServer pid=2152) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2152) File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 1264, in create_model_config
(APIServer pid=2152) return ModelConfig(
(APIServer pid=2152) ^^^^^^^^^^^^
(APIServer pid=2152) File "/usr/local/lib/python3.12/dist-packages/pydantic/_internal/_dataclasses.py", line 121, in init
(APIServer pid=2152) s.pydantic_validator.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
(APIServer pid=2152) pydantic_core._pydantic_core.ValidationError: 1 validation error for ModelConfig
(APIServer pid=2152) Value error, The checkpoint you are trying to load has model type
qwen3_5_moebut Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.(APIServer pid=2152)
(APIServer pid=2152) You can update Transformers with the command
pip install --upgrade transformers. If this does not work, and the checkpoint is very new, then there may not be a release version that supports this model yet. In this case, you can get the most up-to-date code by installing Transformers from source with the commandpip install git+https://github.com/huggingface/transformers.git[type=value_error, input_value=ArgsKwargs((), {'model': ...rocessor_plugin': None}), input_type=ArgsKwargs]报错,显示transformer不能识别qwen3.5的架构,我试了5.2.0的版本是可以识别的,但是vllm不支持,本人使用transformer加fastapi的方式启动服务速度特别慢,有无大佬支招
Beta Was this translation helpful? Give feedback.
All reactions