-
Notifications
You must be signed in to change notification settings - Fork 2k
Description
System Info
H100
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
python3 /opt/tensorrt-llm/examples/auto_deploy/build_and_run_ad.py --model nvidia/NVIDIA-Nemotron-Nano-9B-v2-NVFP4 --args.yaml-extra /opt/tensorrt-llm/examples/auto_deploy/model_registry/configs/dashboard_default.yaml --args.yaml-extra /opt/tensorrt-llm/examples/auto_deploy/model_registry/configs/world_size_2.yaml
Expected behavior
model should build
actual behavior
0: File "/opt/tensorrt-llm/tensorrt_llm/_torch/auto_deploy/utils/node_utils.py", line 490, in get_all_layer_subgraphs 0: layer_subgraph = get_layer_after_linear_node(linear_nodes, terminating_indices) 0: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 0: File "/opt/tensorrt-llm/tensorrt_llm/_torch/auto_deploy/utils/node_utils.py", line 871, in get_layer_after_linear_node 0: assert len(ssm_nodes) == 1, "SSM layer must have exactly one SSM node" 0: ^^^^^^^^^^^^^^^^^^^ 0: AssertionError: SSM layer must have exactly one SSM node
additional notes
Running nvidia/NVIDIA-Nemotron-Nano-9B-v2-NVFP4 through Auto Deploy fails during the sharding transform with an assertion error about multiple SSM nodes being found in a single detected layer.
Root Cause Analysis
The get_layer_after_linear_node function uses BFS traversal to detect layer boundaries based on linear projections with matching embedding dimensions. For NemotronH hybrid models with multiple consecutive Mamba blocks, if those blocks have compatible linear projection shapes, they get grouped into a single "layer"
Before submitting a new issue...
- Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status