Skip to content

[Bug][AutoDeploy] Execution hangs when enabling cudagraphs and AllReduceStrategy.AUTO #8781

@suyoggupta

Description

@suyoggupta

System Info

CW-DFW. TRTLLM main

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Token generation hangs when running the build_and_run_ad.py script with torch-cudagraphs and AllReduceStrategy.AUTO

Repro steps:
In https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/_torch/auto_deploy/distributed/trtllm.py#L21
change NCCL->AUTO

MODEL=meta-llama/Llama-3.1-8B-Instruct
python examples/auto_deploy/build_and_run_ad.py --model $MODEL --args.world-size 4 --args.compile_backend=torch-cudagraph

Expected behavior

torch-simple works and produces legible outputs

actual behavior

hang

additional notes

n/a

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.

Metadata

Metadata

Assignees

Labels

AutoDeploy<NV> AutoDeploy BackendScale-out<NV>Multi-GPU and distributed inference scaling issues, tensor/pipeline/data parallelismbugSomething isn't working

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions