-
Notifications
You must be signed in to change notification settings - Fork 2k
Description
System Info
x86_64
i5 11600k
32gb ddr4-3200
5070 12gb
TensorRT-LLM 1.0.0 - 1.2.0 (rc0-rc5)
Using official docker container images from: https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tensorrt-llm/containers/release
When trying the script examples/models/core/llama/convert_checkpoint.py with either --use_fp8 OR --use_nvfp4, I get 'NoneType' errors regardless of model tried (I've tried nearly a dozen models from HF)
Is this codepath just not fully implemented yet?- Or is this a bug?
EG; Download MythoMax-L2-13B, Kimiko, Wayfarer, Codex, whatever, from HF and then run convert_checkpoint with --use_fp8 or --use_nvfp4 flags and watch it die (I've tried both with and without --calib_data arguments and datasets)
This is the typical error:
Traceback (most recent call last):
File "/app/tensorrt_llm/examples/models/core/llama/./convert_checkpoint.py", line 538, in execute
future.result()
File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
raise self._exception
File "/usr/lib/python3.12/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/tensorrt_llm/examples/models/core/llama/./convert_checkpoint.py", line 504, in convert_and_save_rank
llama = LLaMAForCausalLM.from_hugging_face(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/models/llama/model.py", line 505, in from_hugging_face
loader.generate_tllm_weights(model, arg_dict)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/models/model_weights_loader.py", line 400, in generate_tllm_weights
self.load(tllm_key,
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/models/model_weights_loader.py", line 311, in load
v = sub_module.postprocess(tllm_key, v, **postprocess_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/quantization/layers.py", line 1503, in postprocess
new_amax = max(weight_scaling_factors).reshape(1, ).to(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: '>' not supported between instances of 'NoneType' and 'NoneType'
Traceback (most recent call last):
File "/app/tensorrt_llm/examples/models/core/llama/./convert_checkpoint.py", line 591, in <module>
main()
File "/app/tensorrt_llm/examples/models/core/llama/./convert_checkpoint.py", line 583, in main
convert_and_save_hf(args)
File "/app/tensorrt_llm/examples/models/core/llama/./convert_checkpoint.py", line 524, in convert_and_save_hf
execute(args.workers, [convert_and_save_rank] * world_size, args)
File "/app/tensorrt_llm/examples/models/core/llama/./convert_checkpoint.py", line 542, in execute
assert len(
^^^^
AssertionError: Checkpoint conversion failed, please check error log.
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
Download any Llama model from HF, run examples/models/core/llama/convert_checkpoint.py with either --use_fp8 or --use_nvfp4 flags and it dies complaining about 'NoneType'
Expected behavior
Should produce an FP8 checkpoint
actual behavior
Fails miserably
Traceback (most recent call last): File "/app/tensorrt_llm/examples/models/core/llama/./convert_checkpoint.py", line 538, in execute future.result() File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result return self.__get_result() ^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result raise self._exception File "/usr/lib/python3.12/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/tensorrt_llm/examples/models/core/llama/./convert_checkpoint.py", line 504, in convert_and_save_rank llama = LLaMAForCausalLM.from_hugging_face( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/models/llama/model.py", line 505, in from_hugging_face loader.generate_tllm_weights(model, arg_dict) File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/models/model_weights_loader.py", line 400, in generate_tllm_weights self.load(tllm_key, File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/models/model_weights_loader.py", line 311, in load v = sub_module.postprocess(tllm_key, v, **postprocess_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/quantization/layers.py", line 1503, in postprocess new_amax = max(weight_scaling_factors).reshape(1, ).to( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ TypeError: '>' not supported between instances of 'NoneType' and 'NoneType' Traceback (most recent call last): File "/app/tensorrt_llm/examples/models/core/llama/./convert_checkpoint.py", line 591, in <module> main() File "/app/tensorrt_llm/examples/models/core/llama/./convert_checkpoint.py", line 583, in main convert_and_save_hf(args) File "/app/tensorrt_llm/examples/models/core/llama/./convert_checkpoint.py", line 524, in convert_and_save_hf execute(args.workers, [convert_and_save_rank] * world_size, args) File "/app/tensorrt_llm/examples/models/core/llama/./convert_checkpoint.py", line 542, in execute assert len( ^^^^ AssertionError: Checkpoint conversion failed, please check error log.
additional notes
Is this expected behavior?- eg; Not-yet-implemented?
Before submitting a new issue...
- Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.