Skip to content

[Bug]: gpt-oss not run with error unsupported quant mode: [262144] #8730

@BayRanger

Description

@BayRanger

System Info

GPU: H200
Tensorrtllm version: 1.1.0rc4
CUDA version: 12,8

Who can help?

run the follwing command to run gpt-oss model

'
python examples/llm-api/quickstart_advanced.py
--model_dir="//models/gpt-oss/gpt-oss-20b"
--prompt="What is your name?"
--max_num_tokens=100
--moe_backend="TRITON"
'
the error shows the quant method is unsupported as following, would someone provide some help?

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

'
python examples/llm-api/quickstart_advanced.py
--model_dir="//models/gpt-oss/gpt-oss-20b"
--prompt="What is your name?"
--max_num_tokens=100
--moe_backend="TRITON"
'

Expected behavior

it could perform inference with gpt-oss model

actual behavior

File "/x/tensorrt_llm/_torch/modules/linear.py", line 1850, in get_quant_method return get_quant_method(quant_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/x/tensorrt_llm/_torch/modules/linear.py", line 1770, in get_quant_method raise ValueError(f'unsupported quant mode: {quant_config.quant_mode}') ValueError: unsupported quant mode: [262144]

additional notes

nothing speical

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Low PrecisionLower-precision formats (INT8/INT4/FP8) for TRTLLM quantization (AWQ, GPTQ).bugSomething isn't workingwaiting for feedback

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions