[Bug][AutoDeploy]: Sharding fails on NemotronH hybrid models - layer detection groups multiple SSM blocks

### System Info

H100

### Who can help?

_No response_

### Information

- [ ] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

`python3 /opt/tensorrt-llm/examples/auto_deploy/build_and_run_ad.py --model nvidia/NVIDIA-Nemotron-Nano-9B-v2-NVFP4 --args.yaml-extra /opt/tensorrt-llm/examples/auto_deploy/model_registry/configs/dashboard_default.yaml --args.yaml-extra /opt/tensorrt-llm/examples/auto_deploy/model_registry/configs/world_size_2.yaml`

### Expected behavior

model should build

### actual behavior

`0:   File "/opt/tensorrt-llm/tensorrt_llm/_torch/auto_deploy/utils/node_utils.py", line 490, in get_all_layer_subgraphs
0:     layer_subgraph = get_layer_after_linear_node(linear_nodes, terminating_indices)
0:                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
0:   File "/opt/tensorrt-llm/tensorrt_llm/_torch/auto_deploy/utils/node_utils.py", line 871, in get_layer_after_linear_node
0:     assert len(ssm_nodes) == 1, "SSM layer must have exactly one SSM node"
0:            ^^^^^^^^^^^^^^^^^^^
0: AssertionError: SSM layer must have exactly one SSM node
`

### additional notes

Running nvidia/NVIDIA-Nemotron-Nano-9B-v2-NVFP4 through Auto Deploy fails during the sharding transform with an assertion error about multiple SSM nodes being found in a single detected layer.

Root Cause Analysis 
The get_layer_after_linear_node function uses BFS traversal to detect layer boundaries based on linear projections with matching embedding dimensions. For NemotronH hybrid models with multiple consecutive Mamba blocks, if those blocks have compatible linear projection shapes, they get grouped into a single "layer"

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and checked the [documentation](https://nvidia.github.io/TensorRT-LLM/) and [examples](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples) for answers to frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug][AutoDeploy]: Sharding fails on NemotronH hybrid models - layer detection groups multiple SSM blocks #10358

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug][AutoDeploy]: Sharding fails on NemotronH hybrid models - layer detection groups multiple SSM blocks #10358

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions