Skip to content

Support for q4_0 format - google/gemma-3-27b-it-qat-q4_0-unquantized #9523

@albertnanda

Description

@albertnanda

System Info

System Information:

  • OS: ubuntu 24.0
  • Python version: 3.12
  • CUDA version: 12.8
  • GPU model(s): L40s
  • Driver version:580.95.05
  • TensorRT-LLM version:1.0.0

Detailed output:

Paste the output of the above commands here

How would you like to use TensorRT-LLM

I want to run inference of a https://huggingface.co/google/gemma-3-27b-it-qat-q4_0-unquantized. I don't know how to integrate it with TensorRT-LLM or optimize it for my use case.

Specific questions:

  • Model:google/gemma-3-27b-it-qat-q4_0-unquantized
  • Use case chatbot):
  • Expected throughput/latency requirements: Latency
  • Multi-GPU setup needed: No
    Do we have support for q4_0 format.I am trying to convert this to tensort google/gemma-3-27b-it-qat-q4_0-unquantized(https://huggingface.co/google/gemma-3-27b-it-qat-q4_0-unquantized). I have converted the model to CausalLM using HF and from there python3 convert_checkpoint.py
    --model-dir ${MODEL_PATH}
    --output-model-dir ${TRT_CHECKPOINT_PATH}
    --ckpt-type hf
    --dtype bfloat16
    --use_weight_only
    --weight_only_precision int4
    trtllm-build
    --checkpoint_dir ${TRT_OUT_PATH}
    --output_dir ${TRT_ENGINE}
    --gemm_plugin auto
    --gpt_attention_plugin auto
    --remove_input_padding enable
    --use_paged_context_fmha enable
    --max_input_len 8192
    --max_seq_len 16384
    --max_num_tokens 32768
    --max_beam_width 1
    --max_batch_size 16
    this generates garbage output

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.

Metadata

Metadata

Assignees

Labels

Low PrecisionLower-precision formats (INT8/INT4/FP8) for TRTLLM quantization (AWQ, GPTQ).questionFurther information is requested

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions