-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
System Info
GPU: H200
Tensorrtllm version: 1.1.0rc4
CUDA version: 12,8
Who can help?
run the follwing command to run gpt-oss model
'
python examples/llm-api/quickstart_advanced.py
--model_dir="//models/gpt-oss/gpt-oss-20b"
--prompt="What is your name?"
--max_num_tokens=100
--moe_backend="TRITON"
'
the error shows the quant method is unsupported as following, would someone provide some help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
'
python examples/llm-api/quickstart_advanced.py
--model_dir="//models/gpt-oss/gpt-oss-20b"
--prompt="What is your name?"
--max_num_tokens=100
--moe_backend="TRITON"
'
Expected behavior
it could perform inference with gpt-oss model
actual behavior
File "/x/tensorrt_llm/_torch/modules/linear.py", line 1850, in get_quant_method return get_quant_method(quant_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/x/tensorrt_llm/_torch/modules/linear.py", line 1770, in get_quant_method raise ValueError(f'unsupported quant mode: {quant_config.quant_mode}') ValueError: unsupported quant mode: [262144]
additional notes
nothing speical
Before submitting a new issue...
- Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.