Skip to content

[Feature]: GGUF model with architecture qwen3vlmoe is not supported yet. #28691

@ivanbaldo

Description

@ivanbaldo

🚀 The feature, motivation and pitch

In 0.11.1rc7.dev109+gca00b1bfc and 0.11.0 we can't use models--Qwen--Qwen3-VL-30B-A3B-Instruct-GGUF/snapshots/f54435e6cc31258f04b0969105c3f6badb197931/Qwen3VL-30B-A3B-Instruct-Q4_K_M.gguf (in a RTX-5090 just in case).

Alternatives

No response

Additional context

[llm]        | (APIServer pid=1) Traceback (most recent call last):
[llm]        | (APIServer pid=1)   File "/usr/local/bin/vllm", line 10, in <module>
[llm]        | (APIServer pid=1)     sys.exit(main())
[llm]        | (APIServer pid=1)              ^^^^^^
[llm]        | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 73, in main
[llm]        | (APIServer pid=1)     args.dispatch_function(args)
[llm]        | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 59, in cmd
[llm]        | (APIServer pid=1)     uvloop.run(run_server(args))
[llm]        | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 96, in run
[llm]        | (APIServer pid=1)     return __asyncio.run(
[llm]        | (APIServer pid=1)            ^^^^^^^^^^^^^^
[llm]        | (APIServer pid=1)   File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
[llm]        | (APIServer pid=1)     return runner.run(main)
[llm]        | (APIServer pid=1)            ^^^^^^^^^^^^^^^^
[llm]        | (APIServer pid=1)   File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
[llm]        | (APIServer pid=1)     return self._loop.run_until_complete(task)
[llm]        | (APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[llm]        | (APIServer pid=1)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
[llm]        | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 48, in wrapper
[llm]        | (APIServer pid=1)     return await main
[llm]        | (APIServer pid=1)            ^^^^^^^^^^
[llm]        | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1944, in run_server
[llm]        | (APIServer pid=1)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
[llm]        | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1963, in run_server_worker
[llm]        | (APIServer pid=1)     async with build_async_engine_client(
[llm]        | (APIServer pid=1)                ^^^^^^^^^^^^^^^^^^^^^^^^^^
[llm]        | (APIServer pid=1)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
[llm]        | (APIServer pid=1)     return await anext(self.gen)
[llm]        | (APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^
[llm]        | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 192, in build_async_engine_client
[llm]        | (APIServer pid=1)     async with build_async_engine_client_from_engine_args(
[llm]        | (APIServer pid=1)                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[llm]        | (APIServer pid=1)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
[llm]        | (APIServer pid=1)     return await anext(self.gen)
[llm]        | (APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^
[llm]        | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 218, in build_async_engine_client_from_engine_args
[llm]        | (APIServer pid=1)     vllm_config = engine_args.create_engine_config(usage_context=usage_context)
[llm]        | (APIServer pid=1)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[llm]        | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 1317, in create_engine_config
[llm]        | (APIServer pid=1)     maybe_override_with_speculators(
[llm]        | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/config.py", line 530, in maybe_override_with_speculators
[llm]        | (APIServer pid=1)     config_dict, _ = PretrainedConfig.get_config_dict(
[llm]        | (APIServer pid=1)                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[llm]        | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/transformers/configuration_utils.py", line 662, in get_config_dict
[llm]        | (APIServer pid=1)     config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
[llm]        | (APIServer pid=1)                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[llm]        | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/transformers/configuration_utils.py", line 753, in _get_config_dict
[llm]        | (APIServer pid=1)     config_dict = load_gguf_checkpoint(resolved_config_file, return_tensors=False)["config"]
[llm]        | (APIServer pid=1)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[llm]        | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/transformers/modeling_gguf_pytorch_utils.py", line 431, in load_gguf_checkpoint
[llm]        | (APIServer pid=1)     raise ValueError(f"GGUF model with architecture {architecture} is not supported yet.")
[llm]        | (APIServer pid=1) ValueError: GGUF model with architecture qwen3vlmoe is not supported yet.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions