benchmark_inference: Add CLI option to enable thunder CUDAGraph Transform by kshitij12345 · Pull Request #2697 · Lightning-AI/lightning-thunder

kshitij12345 · 2025-10-27T12:11:18Z

Command

python thunder/benchmarks/benchmark_inference.py --input-length 32 --output-length 3 --mode thunder --num-iterations 10 --enable-thunder-cudagraph

NOTE: Need to revert 13f7171

Running the above command leads to the following error (seems to fail during FusionDefinition execution)

Traceback (most recent call last):
  File "/opt/pytorch/lightning-thunder/thunder/transforms/cudagraph.py", line 130, in build_cuda_graph
    static_outputs = fn(*static_inputs)
                     ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 121, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/pytorch/lightning-thunder/thunder/executors/torchex.py", line 169, in no_autocast_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "thunder.CUDAGraph5_39", line 116, in CUDAGraph5
  File "/opt/pytorch/lightning-thunder/thunder/executors/nvfuserex_impl.py", line 566, in __call__
    return fd.execute(
           ^^^^^^^^^^^
  File "/opt/pytorch/nvfuser/python/nvfuser_direct/__init__.py", line 318, in execute
    return self.fec.execute(
           ^^^^^^^^^^^^^^^^^
RuntimeError: 
Error from segmentation group 3: CUDA error: operation not permitted when stream is capturing
Search for `cudaErrorStreamCaptureUnsupported' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Exception raised from memcpy_and_sync at /opt/pytorch/pytorch/c10/cuda/CUDAFunctions.h:106 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x88 (0x74e88646d008 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x5cb0a (0x74e8bc9b2b0a in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10_cuda.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, unsigned int, bool) + 0x1c8 (0x74e8bc9b27c8 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10_cuda.so)

…form

mattteochen

Thank you @kshitij12345

wujingyue · 2025-10-27T18:28:35Z

cc @mdavis36

thunder/benchmarks/benchmark_inference.py

tbqh · 2025-10-28T15:45:52Z

Seems to be broken with different parameters:
python thunder/benchmarks/benchmark_inference.py --input-length 4096 --output-length 4 --mode thunder --enable-nv-linear --warmup-iterations 2 --num-iterations 2 --enable-thunder-cudagraph

Causes:

  File "/opt/pytorch/nvfuser/lightning-thunder/thunder/benchmarks/benchmark_inference.py", line 737, in <module>
    main()
  File "/opt/pytorch/nvfuser/lightning-thunder/thunder/benchmarks/benchmark_inference.py", line 719, in main
    benchmark = InferenceBenchmark(config)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/pytorch/nvfuser/lightning-thunder/thunder/benchmarks/benchmark_inference.py", line 279, in __init__
    self.model = self._compile_model(model)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/pytorch/nvfuser/lightning-thunder/thunder/benchmarks/benchmark_inference.py", line 308, in _compile_model
    return thunderfx(model, **self._thunder_jit_options)
                              ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/pytorch/nvfuser/lightning-thunder/thunder/benchmarks/benchmark_inference.py", line 298, in _thunder_jit_options
    res["transforms"].append(CUDAGraphTransform())
    ~~~^^^^^^^^^^^^^^
KeyError: 'transforms'

Edit:
Looks like it's just the --enable-nv-linear flag. Is this expected to not be compatible with --enable-thunder-cudagraph?

kshitij12345 · 2025-10-28T16:15:28Z

@tbqh Thanks for reporting, have pushed a patch to fix the issue.

mattteochen · 2025-10-29T15:17:49Z

Just a FYI for anyone testing this, the transform + NVIDIA/Fuser#5434 (comment) are expected to work on blackwell

…g-AI/lightning-thunder into ksh/bench-inf-cudagraph

kshitij12345 · 2025-10-30T11:14:29Z

Ping @KaelanDt for review

KaelanDt

thank you @kshitij12345

tbqh · 2025-11-04T05:01:08Z

Thanks for the --enable-nv-linear fix, this PR is working well.

Before:

Time Between Output Tokens (TBOT): 7.25 ms
Prefill Time: 15.68 ms
Decode Time: 7.25 ms

After:

Time Between Output Tokens (TBOT): 4.47 ms
Prefill Time: 15.27 ms
Decode Time: 4.47 ms

kshitij12345 added 2 commits October 27, 2025 05:10

benchmark_inference: Add CLI option to enable thunder CUDAGraph Trans…

62585f5

…form

update

686d594

kshitij12345 marked this pull request as ready for review October 27, 2025 13:54

kshitij12345 requested review from KaelanDt, lantiga, mruberry and t-vi as code owners October 27, 2025 13:54

kshitij12345 requested review from kiya00, mattteochen and wujingyue October 27, 2025 13:54

kiya00 approved these changes Oct 27, 2025

View reviewed changes

mattteochen approved these changes Oct 27, 2025

View reviewed changes

wujingyue approved these changes Oct 27, 2025

View reviewed changes

thunder/benchmarks/benchmark_inference.py Outdated Show resolved Hide resolved

thunder/benchmarks/benchmark_inference.py Show resolved Hide resolved

kshitij12345 added 2 commits October 27, 2025 14:48

move import at module level

438768d

Merge branch 'main' into ksh/bench-inf-cudagraph

60800ab

wujingyue mentioned this pull request Oct 27, 2025

GroupedMmaOp::evaluate can't be captured by CUDA graph NVIDIA/Fuser#5434

Closed

fix usage with nv_enable_linear

0e89d2f

kshitij12345 added 2 commits October 30, 2025 03:44

fix merge conflict

6647fa2

Merge branch 'ksh/bench-inf-cudagraph' of https://github.com/Lightnin…

1c9e8f6

…g-AI/lightning-thunder into ksh/bench-inf-cudagraph

kshitij12345 enabled auto-merge (squash) October 30, 2025 11:10

Merge branch 'main' into ksh/bench-inf-cudagraph

b5b62bd

KaelanDt approved these changes Nov 3, 2025

View reviewed changes

retrigger CI

87eb65c

kshitij12345 merged commit 77261d1 into main Nov 4, 2025
51 checks passed

kshitij12345 deleted the ksh/bench-inf-cudagraph branch November 4, 2025 10:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmark_inference: Add CLI option to enable thunder CUDAGraph Transform#2697

benchmark_inference: Add CLI option to enable thunder CUDAGraph Transform#2697
kshitij12345 merged 9 commits intomainfrom
ksh/bench-inf-cudagraph

kshitij12345 commented Oct 27, 2025 •

edited

Loading

Uh oh!

mattteochen left a comment

Uh oh!

wujingyue commented Oct 27, 2025

Uh oh!

Uh oh!

Uh oh!

tbqh commented Oct 28, 2025 •

edited

Loading

Uh oh!

kshitij12345 commented Oct 28, 2025

Uh oh!

mattteochen commented Oct 29, 2025

Uh oh!

kshitij12345 commented Oct 30, 2025

Uh oh!

KaelanDt left a comment

Uh oh!

tbqh commented Nov 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

kshitij12345 commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattteochen left a comment

Choose a reason for hiding this comment

Uh oh!

wujingyue commented Oct 27, 2025

Uh oh!

Uh oh!

Uh oh!

tbqh commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kshitij12345 commented Oct 28, 2025

Uh oh!

mattteochen commented Oct 29, 2025

Uh oh!

kshitij12345 commented Oct 30, 2025

Uh oh!

KaelanDt left a comment

Choose a reason for hiding this comment

Uh oh!

tbqh commented Nov 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

kshitij12345 commented Oct 27, 2025 •

edited

Loading

tbqh commented Oct 28, 2025 •

edited

Loading