[Bug]: Llama -> FP8 / NVFP4 conversion broken in convert_checkpoint.py

### System Info

x86_64
i5 11600k
32gb ddr4-3200
5070 12gb
TensorRT-LLM 1.0.0 - 1.2.0 (rc0-rc5)

Using *official* docker container images from: https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tensorrt-llm/containers/release

When trying the script examples/models/core/llama/convert_checkpoint.py with either --use_fp8 OR --use_nvfp4, I get 'NoneType' errors *regardless of model tried* (I've tried nearly a dozen models from HF)

Is this codepath just not fully implemented yet?- Or is this a bug?

EG; Download MythoMax-L2-13B, Kimiko, Wayfarer, Codex, whatever, from HF and then run convert_checkpoint with --use_fp8 or --use_nvfp4 flags and watch it die (I've tried both with and without --calib_data arguments and datasets)

This is the typical error:
```
Traceback (most recent call last):
  File "/app/tensorrt_llm/examples/models/core/llama/./convert_checkpoint.py", line 538, in execute
    future.result()
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/usr/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/tensorrt_llm/examples/models/core/llama/./convert_checkpoint.py", line 504, in convert_and_save_rank
    llama = LLaMAForCausalLM.from_hugging_face(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/models/llama/model.py", line 505, in from_hugging_face
    loader.generate_tllm_weights(model, arg_dict)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/models/model_weights_loader.py", line 400, in generate_tllm_weights
    self.load(tllm_key,
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/models/model_weights_loader.py", line 311, in load
    v = sub_module.postprocess(tllm_key, v, **postprocess_kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/quantization/layers.py", line 1503, in postprocess
    new_amax = max(weight_scaling_factors).reshape(1, ).to(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: '>' not supported between instances of 'NoneType' and 'NoneType'
Traceback (most recent call last):
  File "/app/tensorrt_llm/examples/models/core/llama/./convert_checkpoint.py", line 591, in <module>
    main()
  File "/app/tensorrt_llm/examples/models/core/llama/./convert_checkpoint.py", line 583, in main
    convert_and_save_hf(args)
  File "/app/tensorrt_llm/examples/models/core/llama/./convert_checkpoint.py", line 524, in convert_and_save_hf
    execute(args.workers, [convert_and_save_rank] * world_size, args)
  File "/app/tensorrt_llm/examples/models/core/llama/./convert_checkpoint.py", line 542, in execute
    assert len(
           ^^^^
AssertionError: Checkpoint conversion failed, please check error log.
```

### Who can help?

_No response_

### Information

- [x] The official example scripts
- [ ] My own modified scripts

### Tasks

- [x] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

Download any Llama model from HF, run examples/models/core/llama/convert_checkpoint.py with either --use_fp8 or --use_nvfp4 flags and it dies complaining about 'NoneType'

### Expected behavior

Should produce an FP8 checkpoint

### actual behavior

Fails miserably
`Traceback (most recent call last):
  File "/app/tensorrt_llm/examples/models/core/llama/./convert_checkpoint.py", line 538, in execute
    future.result()
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/usr/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/tensorrt_llm/examples/models/core/llama/./convert_checkpoint.py", line 504, in convert_and_save_rank
    llama = LLaMAForCausalLM.from_hugging_face(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/models/llama/model.py", line 505, in from_hugging_face
    loader.generate_tllm_weights(model, arg_dict)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/models/model_weights_loader.py", line 400, in generate_tllm_weights
    self.load(tllm_key,
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/models/model_weights_loader.py", line 311, in load
    v = sub_module.postprocess(tllm_key, v, **postprocess_kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/quantization/layers.py", line 1503, in postprocess
    new_amax = max(weight_scaling_factors).reshape(1, ).to(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: '>' not supported between instances of 'NoneType' and 'NoneType'
Traceback (most recent call last):
  File "/app/tensorrt_llm/examples/models/core/llama/./convert_checkpoint.py", line 591, in <module>
    main()
  File "/app/tensorrt_llm/examples/models/core/llama/./convert_checkpoint.py", line 583, in main
    convert_and_save_hf(args)
  File "/app/tensorrt_llm/examples/models/core/llama/./convert_checkpoint.py", line 524, in convert_and_save_hf
    execute(args.workers, [convert_and_save_rank] * world_size, args)
  File "/app/tensorrt_llm/examples/models/core/llama/./convert_checkpoint.py", line 542, in execute
    assert len(
           ^^^^
AssertionError: Checkpoint conversion failed, please check error log.`

### additional notes

Is this expected behavior?- eg; Not-yet-implemented?

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and checked the [documentation](https://nvidia.github.io/TensorRT-LLM/) and [examples](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples) for answers to frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: Llama -> FP8 / NVFP4 conversion broken in convert_checkpoint.py #10034

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: Llama -> FP8 / NVFP4 conversion broken in convert_checkpoint.py #10034

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions