Skip to content

activation_scaling_factor missing in TensorRT-LLM Checkpoint FP8 Export #500

@shivghai

Description

@shivghai

Before submitting an issue, please make sure it hasn't been already addressed by searching through the existing and past issues.

Describe the bug

I am quantizing a Llama ckpt to FP8 (see code in repro section) using export_tensorrt_llm_checkpoint. The quantization is successful.

But, when I try to build a TensorRT-LLM engine, it errors out because it doesn't seem to find layer activation_scaling_factors.

Steps/Code to reproduce bug

Used this code to quantize a Llama ckpt

def quantize_to_fp8_trt_checkpoint(model_dir: str, output_dir: str, quantization_config=mtq.FP8_DEFAULT_CFG):
    model = LlamaForCausalLM.from_pretrained(model_dir)
    model = mtq.quantize(model, quantization_config, forward_loop=None)


    with torch.inference_mode():
        export_tensorrt_llm_checkpoint(
            model,
            "Llama",
            torch.float16,
            output_dir,
            1,
            1,
        )

Expected behavior

Who can help?

  • ?

System information

- Container used (if applicable): ?
- OS (e.g., Ubuntu 22.04, CentOS 7, Windows 10): Debian GNU/Linux 12 (bookworm)
- CPU architecture (x86_64, aarch64): x86_64
- GPU name (e.g. H100, A100, L40S): NVIDIA H100 80GB HBM3
- GPU memory size: 79.6 GB
- Number of GPUs: 1
- Library versions (if applicable):
  - Python: 3.10.15
  - ModelOpt version or commit hash: 0.29.0
  - CUDA: 12.9
  - PyTorch: 2.7.0+cu126
  - Transformers: 4.51.3
/home/shiv_elevenlabs_io/.cache/pypoetry/virtualenvs/asr-FjcPrgbP-py3.10/lib/python3.10/site-packages/modelopt/torch/utils/import_utils.py:32: UserWarning: Failed to import diffusers plugin due to: ImportError('Requires Flash-Attention version >=2.7.1,<=2.8.0 but got 2.8.1.'). You may ignore this warning if you do not need this plugin.
  warnings.warn(
2025-11-04 08:49:12,334 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
/home/shiv_elevenlabs_io/.cache/pypoetry/virtualenvs/asr-FjcPrgbP-py3.10/lib/python3.10/site-packages/torch/utils/cpp_extension.py:2356: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'].
  warnings.warn(
[TensorRT-LLM] TensorRT-LLM version: 0.20.0
  - TensorRT-LLM: 0.20.0
  - ONNXRuntime: 1.22.1
  - TensorRT: 10.10.0.31

Metadata

Metadata

Labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions