-
Notifications
You must be signed in to change notification settings - Fork 457
Open
Labels
bugSomething isn't workingSomething isn't working
Description
⚙️ Your current environment
The output of python collect_env.py
### Environment Information ###
Operating System: `Linux-6.8.0-94-generic-x86_64-with-glibc2.39`
Python Version: `3.12.3 (main, Nov 6 2025, 13:44:16) [GCC 13.3.0]`
llm-compressor Version: `0.9.0.1`
compressed-tensors Version: `0.13.0`
transformers Version: `4.57.3`
torch Version: `2.9.1`
CUDA Devices: `['NVIDIA RTX PRO 6000 Blackwell Workstation Edition']`
AMD Devices: `None`
NPU Devices: `None`
🐛 Describe the bug
I have this code which as far as I can see is a straight implementation of what is in the readme for Granite4
It completes successfully, but the resulting model does not work. When I try to serve it in vllm I get this crash:
EngineCore_DP0 pid=14675) ERROR 02-07 03:59:09 [core.py:946] EngineCore failed to start.
(EngineCore_DP0 pid=14675) ERROR 02-07 03:59:09 [core.py:946] Traceback (most recent call last):
(EngineCore_DP0 pid=14675) ERROR 02-07 03:59:09 [core.py:946] File "/opt/venv/datasci/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 937, in run_engine_core
(EngineCore_DP0 pid=14675) ERROR 02-07 03:59:09 [core.py:946] engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore_DP0 pid=14675) ERROR 02-07 03:59:09 [core.py:946] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=14675) ERROR 02-07 03:59:09 [core.py:946] File "/opt/venv/datasci/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 691, in __init__
(EngineCore_DP0 pid=14675) ERROR 02-07 03:59:09 [core.py:946] super().__init__(
(EngineCore_DP0 pid=14675) ERROR 02-07 03:59:09 [core.py:946] File "/opt/venv/datasci/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 105, in __init__
(EngineCore_DP0 pid=14675) ERROR 02-07 03:59:09 [core.py:946] self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=14675) ERROR 02-07 03:59:09 [core.py:946] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=14675) ERROR 02-07 03:59:09 [core.py:946] File "/opt/venv/datasci/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 101, in __init__
(EngineCore_DP0 pid=14675) ERROR 02-07 03:59:09 [core.py:946] self._init_executor()
(EngineCore_DP0 pid=14675) ERROR 02-07 03:59:09 [core.py:946] File "/opt/venv/datasci/lib/python3.12/site-packages/vllm/v1/executor/uniproc_executor.py", line 48, in _init_executor
(EngineCore_DP0 pid=14675) ERROR 02-07 03:59:09 [core.py:946] self.driver_worker.load_model()
(EngineCore_DP0 pid=14675) ERROR 02-07 03:59:09 [core.py:946] File "/opt/venv/datasci/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 275, in load_model
(EngineCore_DP0 pid=14675) ERROR 02-07 03:59:09 [core.py:946] self.model_runner.load_model(eep_scale_up=eep_scale_up)
(EngineCore_DP0 pid=14675) ERROR 02-07 03:59:09 [core.py:946] File "/opt/venv/datasci/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4052, in load_model
(EngineCore_DP0 pid=14675) ERROR 02-07 03:59:09 [core.py:946] self.model = model_loader.load_model(
(EngineCore_DP0 pid=14675) ERROR 02-07 03:59:09 [core.py:946] ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=14675) ERROR 02-07 03:59:09 [core.py:946] File "/opt/venv/datasci/lib/python3.12/site-packages/vllm/model_executor/model_loader/base_loader.py", line 58, in load_model
(EngineCore_DP0 pid=14675) ERROR 02-07 03:59:09 [core.py:946] self.load_weights(model, model_config)
(EngineCore_DP0 pid=14675) ERROR 02-07 03:59:09 [core.py:946] File "/opt/venv/datasci/lib/python3.12/site-packages/vllm/model_executor/model_loader/default_loader.py", line 288, in load_weights
(EngineCore_DP0 pid=14675) ERROR 02-07 03:59:09 [core.py:946] loaded_weights = model.load_weights(self.get_all_weights(model_config, model))
(EngineCore_DP0 pid=14675) ERROR 02-07 03:59:09 [core.py:946] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=14675) ERROR 02-07 03:59:09 [core.py:946] File "/opt/venv/datasci/lib/python3.12/site-packages/vllm/model_executor/models/granitemoehybrid.py", line 709, in load_weights
(EngineCore_DP0 pid=14675) ERROR 02-07 03:59:09 [core.py:946] return loader.load_weights(weights)
(EngineCore_DP0 pid=14675) ERROR 02-07 03:59:09 [core.py:946] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=14675) ERROR 02-07 03:59:09 [core.py:946] File "/opt/venv/datasci/lib/python3.12/site-packages/vllm/model_executor/model_loader/online_quantization.py", line 173, in patched_model_load_weights
(EngineCore_DP0 pid=14675) ERROR 02-07 03:59:09 [core.py:946] return original_load_weights(auto_weight_loader, weights, mapper=mapper)
(EngineCore_DP0 pid=14675) ERROR 02-07 03:59:09 [core.py:946] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=14675) ERROR 02-07 03:59:09 [core.py:946] File "/opt/venv/datasci/lib/python3.12/site-packages/vllm/model_executor/models/utils.py", line 342, in load_weights
(EngineCore_DP0 pid=14675) ERROR 02-07 03:59:09 [core.py:946] autoloaded_weights = set(self._load_module("", self.module, weights))
(EngineCore_DP0 pid=14675) ERROR 02-07 03:59:09 [core.py:946] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=14675) ERROR 02-07 03:59:09 [core.py:946] File "/opt/venv/datasci/lib/python3.12/site-packages/vllm/model_executor/models/utils.py", line 290, in _load_module
(EngineCore_DP0 pid=14675) ERROR 02-07 03:59:09 [core.py:946] yield from self._load_module(
(EngineCore_DP0 pid=14675) ERROR 02-07 03:59:09 [core.py:946] File "/opt/venv/datasci/lib/python3.12/site-packages/vllm/model_executor/models/utils.py", line 263, in _load_module
(EngineCore_DP0 pid=14675) ERROR 02-07 03:59:09 [core.py:946] loaded_params = module_load_weights(weights)
(EngineCore_DP0 pid=14675) ERROR 02-07 03:59:09 [core.py:946] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=14675) ERROR 02-07 03:59:09 [core.py:946] File "/opt/venv/datasci/lib/python3.12/site-packages/vllm/model_executor/models/granitemoehybrid.py", line 577, in load_weights
(EngineCore_DP0 pid=14675) ERROR 02-07 03:59:09 [core.py:946] _load(n, p)
(EngineCore_DP0 pid=14675) ERROR 02-07 03:59:09 [core.py:946] File "/opt/venv/datasci/lib/python3.12/site-packages/vllm/model_executor/models/granitemoehybrid.py", line 444, in _load
(EngineCore_DP0 pid=14675) ERROR 02-07 03:59:09 [core.py:946] param = params_dict[n]
(EngineCore_DP0 pid=14675) ERROR 02-07 03:59:09 [core.py:946] ~~~~~~~~~~~^^^
(EngineCore_DP0 pid=14675) ERROR 02-07 03:59:09 [core.py:946] KeyError: 'layers.0.block_sparse_moe.router.layer.weight_scale'
(EngineCore_DP0 pid=14675) Process EngineCore_DP0:
(EngineCore_DP0 pid=14675) Traceback (most recent call last):
(EngineCore_DP0 pid=14675) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=14675) self.run()
(EngineCore_DP0 pid=14675) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=14675) self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=14675) File "/opt/venv/datasci/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 950, in run_engine_core
(EngineCore_DP0 pid=14675) raise e
(EngineCore_DP0 pid=14675) File "/opt/venv/datasci/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 937, in run_engine_core
(EngineCore_DP0 pid=14675) engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore_DP0 pid=14675) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=14675) File "/opt/venv/datasci/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 691, in __init__
(EngineCore_DP0 pid=14675) super().__init__(
(EngineCore_DP0 pid=14675) File "/opt/venv/datasci/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 105, in __init__
(EngineCore_DP0 pid=14675) self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=14675) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=14675) File "/opt/venv/datasci/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 101, in __init__
(EngineCore_DP0 pid=14675) self._init_executor()
(EngineCore_DP0 pid=14675) File "/opt/venv/datasci/lib/python3.12/site-packages/vllm/v1/executor/uniproc_executor.py", line 48, in _init_executor
(EngineCore_DP0 pid=14675) self.driver_worker.load_model()
(EngineCore_DP0 pid=14675) File "/opt/venv/datasci/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 275, in load_model
(EngineCore_DP0 pid=14675) self.model_runner.load_model(eep_scale_up=eep_scale_up)
(EngineCore_DP0 pid=14675) File "/opt/venv/datasci/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4052, in load_model
(EngineCore_DP0 pid=14675) self.model = model_loader.load_model(
(EngineCore_DP0 pid=14675) ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=14675) File "/opt/venv/datasci/lib/python3.12/site-packages/vllm/model_executor/model_loader/base_loader.py", line 58, in load_model
(EngineCore_DP0 pid=14675) self.load_weights(model, model_config)
(EngineCore_DP0 pid=14675) File "/opt/venv/datasci/lib/python3.12/site-packages/vllm/model_executor/model_loader/default_loader.py", line 288, in load_weights
(EngineCore_DP0 pid=14675) loaded_weights = model.load_weights(self.get_all_weights(model_config, model))
(EngineCore_DP0 pid=14675) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=14675) File "/opt/venv/datasci/lib/python3.12/site-packages/vllm/model_executor/models/granitemoehybrid.py", line 709, in load_weights
(EngineCore_DP0 pid=14675) return loader.load_weights(weights)
(EngineCore_DP0 pid=14675) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=14675) File "/opt/venv/datasci/lib/python3.12/site-packages/vllm/model_executor/model_loader/online_quantization.py", line 173, in patched_model_load_weights
(EngineCore_DP0 pid=14675) return original_load_weights(auto_weight_loader, weights, mapper=mapper)
(EngineCore_DP0 pid=14675) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=14675) File "/opt/venv/datasci/lib/python3.12/site-packages/vllm/model_executor/models/utils.py", line 342, in load_weights
(EngineCore_DP0 pid=14675) autoloaded_weights = set(self._load_module("", self.module, weights))
(EngineCore_DP0 pid=14675) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=14675) File "/opt/venv/datasci/lib/python3.12/site-packages/vllm/model_executor/models/utils.py", line 290, in _load_module
(EngineCore_DP0 pid=14675) yield from self._load_module(
(EngineCore_DP0 pid=14675) File "/opt/venv/datasci/lib/python3.12/site-packages/vllm/model_executor/models/utils.py", line 263, in _load_module
(EngineCore_DP0 pid=14675) loaded_params = module_load_weights(weights)
(EngineCore_DP0 pid=14675) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=14675) File "/opt/venv/datasci/lib/python3.12/site-packages/vllm/model_executor/models/granitemoehybrid.py", line 577, in load_weights
(EngineCore_DP0 pid=14675) _load(n, p)
(EngineCore_DP0 pid=14675) File "/opt/venv/datasci/lib/python3.12/site-packages/vllm/model_executor/models/granitemoehybrid.py", line 444, in _load
(EngineCore_DP0 pid=14675) param = params_dict[n]
(EngineCore_DP0 pid=14675) ~~~~~~~~~~~^^^
(EngineCore_DP0 pid=14675) KeyError: 'layers.0.block_sparse_moe.router.layer.weight_scale'
Note: originally I excluded some more layers (attention, embeddings, Mamba in/out, MoE router) from the quantization and the model loaded but output !!!!!!!
🛠️ Steps to reproduce
$ python test_fp8_no_exclusion.py --model-name ibm-granite/granite-4.0-h-small --output granite-4.0-h-small-fp8
$ cd granite-4.0-h-small-fp8
$ vllm serve . --port 8080
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working