-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Closed
Labels
AutoDeploy<NV> AutoDeploy Backend<NV> AutoDeploy BackendScale-out<NV>Multi-GPU and distributed inference scaling issues, tensor/pipeline/data parallelism<NV>Multi-GPU and distributed inference scaling issues, tensor/pipeline/data parallelismbugSomething isn't workingSomething isn't working
Description
System Info
CW-DFW. TRTLLM main
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
Token generation hangs when running the build_and_run_ad.py script with torch-cudagraphs and AllReduceStrategy.AUTO
Repro steps:
In https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/_torch/auto_deploy/distributed/trtllm.py#L21
change NCCL->AUTO
MODEL=meta-llama/Llama-3.1-8B-Instruct
python examples/auto_deploy/build_and_run_ad.py --model $MODEL --args.world-size 4 --args.compile_backend=torch-cudagraph
Expected behavior
torch-simple works and produces legible outputs
actual behavior
hang
additional notes
n/a
Before submitting a new issue...
- Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
AutoDeploy<NV> AutoDeploy Backend<NV> AutoDeploy BackendScale-out<NV>Multi-GPU and distributed inference scaling issues, tensor/pipeline/data parallelism<NV>Multi-GPU and distributed inference scaling issues, tensor/pipeline/data parallelismbugSomething isn't workingSomething isn't working
Type
Projects
Status
Done