activation_scaling_factor missing in TensorRT-LLM Checkpoint FP8 Export

**Before submitting an issue, please make sure it hasn't been already addressed by searching through the [existing and past issues](https://github.com/NVIDIA/TensorRT-Model-Optimizer/issues?q=is%3Aissue).**

## Describe the bug
I am quantizing a Llama ckpt to FP8 (see code in repro section) using `export_tensorrt_llm_checkpoint`. The quantization is successful.

But, when I try to build a TensorRT-LLM engine, it errors out because it doesn't seem to find layer `activation_scaling_factors`.

### Steps/Code to reproduce bug
Used this code to quantize a Llama ckpt
```
def quantize_to_fp8_trt_checkpoint(model_dir: str, output_dir: str, quantization_config=mtq.FP8_DEFAULT_CFG):
    model = LlamaForCausalLM.from_pretrained(model_dir)
    model = mtq.quantize(model, quantization_config, forward_loop=None)


    with torch.inference_mode():
        export_tensorrt_llm_checkpoint(
            model,
            "Llama",
            torch.float16,
            output_dir,
            1,
            1,
        )
```

### Expected behavior

### Who can help?



- ?

## System information

```
- Container used (if applicable): ?
- OS (e.g., Ubuntu 22.04, CentOS 7, Windows 10): Debian GNU/Linux 12 (bookworm)
- CPU architecture (x86_64, aarch64): x86_64
- GPU name (e.g. H100, A100, L40S): NVIDIA H100 80GB HBM3
- GPU memory size: 79.6 GB
- Number of GPUs: 1
- Library versions (if applicable):
  - Python: 3.10.15
  - ModelOpt version or commit hash: 0.29.0
  - CUDA: 12.9
  - PyTorch: 2.7.0+cu126
  - Transformers: 4.51.3
/home/shiv_elevenlabs_io/.cache/pypoetry/virtualenvs/asr-FjcPrgbP-py3.10/lib/python3.10/site-packages/modelopt/torch/utils/import_utils.py:32: UserWarning: Failed to import diffusers plugin due to: ImportError('Requires Flash-Attention version >=2.7.1,<=2.8.0 but got 2.8.1.'). You may ignore this warning if you do not need this plugin.
  warnings.warn(
2025-11-04 08:49:12,334 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
/home/shiv_elevenlabs_io/.cache/pypoetry/virtualenvs/asr-FjcPrgbP-py3.10/lib/python3.10/site-packages/torch/utils/cpp_extension.py:2356: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'].
  warnings.warn(
[TensorRT-LLM] TensorRT-LLM version: 0.20.0
  - TensorRT-LLM: 0.20.0
  - ONNXRuntime: 1.22.1
  - TensorRT: 10.10.0.31
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

activation_scaling_factor missing in TensorRT-LLM Checkpoint FP8 Export #500

Describe the bug

Steps/Code to reproduce bug

Expected behavior

Who can help?

System information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

activation_scaling_factor missing in TensorRT-LLM Checkpoint FP8 Export #500

Description

Describe the bug

Steps/Code to reproduce bug

Expected behavior

Who can help?

System information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions