Skip to content

[Bug]: Llama -> FP8 / NVFP4 conversion broken in convert_checkpoint.py #10034

@Mashrien

Description

@Mashrien

System Info

x86_64
i5 11600k
32gb ddr4-3200
5070 12gb
TensorRT-LLM 1.0.0 - 1.2.0 (rc0-rc5)

Using official docker container images from: https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tensorrt-llm/containers/release

When trying the script examples/models/core/llama/convert_checkpoint.py with either --use_fp8 OR --use_nvfp4, I get 'NoneType' errors regardless of model tried (I've tried nearly a dozen models from HF)

Is this codepath just not fully implemented yet?- Or is this a bug?

EG; Download MythoMax-L2-13B, Kimiko, Wayfarer, Codex, whatever, from HF and then run convert_checkpoint with --use_fp8 or --use_nvfp4 flags and watch it die (I've tried both with and without --calib_data arguments and datasets)

This is the typical error:

Traceback (most recent call last):
  File "/app/tensorrt_llm/examples/models/core/llama/./convert_checkpoint.py", line 538, in execute
    future.result()
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/usr/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/tensorrt_llm/examples/models/core/llama/./convert_checkpoint.py", line 504, in convert_and_save_rank
    llama = LLaMAForCausalLM.from_hugging_face(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/models/llama/model.py", line 505, in from_hugging_face
    loader.generate_tllm_weights(model, arg_dict)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/models/model_weights_loader.py", line 400, in generate_tllm_weights
    self.load(tllm_key,
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/models/model_weights_loader.py", line 311, in load
    v = sub_module.postprocess(tllm_key, v, **postprocess_kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/quantization/layers.py", line 1503, in postprocess
    new_amax = max(weight_scaling_factors).reshape(1, ).to(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: '>' not supported between instances of 'NoneType' and 'NoneType'
Traceback (most recent call last):
  File "/app/tensorrt_llm/examples/models/core/llama/./convert_checkpoint.py", line 591, in <module>
    main()
  File "/app/tensorrt_llm/examples/models/core/llama/./convert_checkpoint.py", line 583, in main
    convert_and_save_hf(args)
  File "/app/tensorrt_llm/examples/models/core/llama/./convert_checkpoint.py", line 524, in convert_and_save_hf
    execute(args.workers, [convert_and_save_rank] * world_size, args)
  File "/app/tensorrt_llm/examples/models/core/llama/./convert_checkpoint.py", line 542, in execute
    assert len(
           ^^^^
AssertionError: Checkpoint conversion failed, please check error log.

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Download any Llama model from HF, run examples/models/core/llama/convert_checkpoint.py with either --use_fp8 or --use_nvfp4 flags and it dies complaining about 'NoneType'

Expected behavior

Should produce an FP8 checkpoint

actual behavior

Fails miserably
Traceback (most recent call last): File "/app/tensorrt_llm/examples/models/core/llama/./convert_checkpoint.py", line 538, in execute future.result() File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result return self.__get_result() ^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result raise self._exception File "/usr/lib/python3.12/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/tensorrt_llm/examples/models/core/llama/./convert_checkpoint.py", line 504, in convert_and_save_rank llama = LLaMAForCausalLM.from_hugging_face( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/models/llama/model.py", line 505, in from_hugging_face loader.generate_tllm_weights(model, arg_dict) File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/models/model_weights_loader.py", line 400, in generate_tllm_weights self.load(tllm_key, File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/models/model_weights_loader.py", line 311, in load v = sub_module.postprocess(tllm_key, v, **postprocess_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/quantization/layers.py", line 1503, in postprocess new_amax = max(weight_scaling_factors).reshape(1, ).to( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ TypeError: '>' not supported between instances of 'NoneType' and 'NoneType' Traceback (most recent call last): File "/app/tensorrt_llm/examples/models/core/llama/./convert_checkpoint.py", line 591, in <module> main() File "/app/tensorrt_llm/examples/models/core/llama/./convert_checkpoint.py", line 583, in main convert_and_save_hf(args) File "/app/tensorrt_llm/examples/models/core/llama/./convert_checkpoint.py", line 524, in convert_and_save_hf execute(args.workers, [convert_and_save_rank] * world_size, args) File "/app/tensorrt_llm/examples/models/core/llama/./convert_checkpoint.py", line 542, in execute assert len( ^^^^ AssertionError: Checkpoint conversion failed, please check error log.

additional notes

Is this expected behavior?- eg; Not-yet-implemented?

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Low PrecisionLower-precision formats (INT8/INT4/FP8) for TRTLLM quantization (AWQ, GPTQ).bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions