Skip to content

Conversation

@agorenstein-nvidia
Copy link

This augments the optimize_module definition to also to LLVM_ENABLE_TIMING=1, bringing it to parity with translate_to_asm. Note that optimize_module uses the new pass manager, while translate_to_asm uses the legacy pass manager. This, plus effort to maintain reasonable inter-operability with the existing LLVM_IR_ENABLE_DUMP behavior, lead to more complicated control flow than expected.

New contributor declaration

  • I am not making a trivial change, such as fixing a typo in a comment.

  • I have written a PR description following these
    rules.

  • I have run pre-commit run --from-ref origin/main --to-ref HEAD. Yes, I did this, and everything is "Skipped" or "Passed".

  • Select one of the following.

    • [ ] I have added tests.
    • This PR does not need a test because this is augmenting debugging information, and searching for the environment variables involved it's not under test.
  • Select one of the following.

    • I have not added any lit tests.
    • [ ] The lit tests I have added follow these best practices,
      including the "tests should be minimal" section. (Usually running Python code
      and using the instructions it generates is not minimal.)

This augments the `optimize_module` definition to also to `LLVM_ENABLE_TIMING=1`, bringing it to parity with `translate_to_asm`. Note that `otpimize_module` uses the new pass manager, while `translate_to_asm` uses the legacy pass manager. This, plus effort to maintain reasonable inter-operability with the existing `LLVM_IR_ENABLE_DUMP` behavior, lead to more complicated control flow than expected.
@masahi masahi requested review from ThomasRaoux and removed request for ptillet October 21, 2025 21:23
PassInstrumentationCallbacks *instrCbPtr = nullptr;
PassInstrumentationCallbacks passInstrCb;
StandardInstrumentations standardInstr(mod->getContext(),
/*DebugLogging*/ true);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why changing this? Does it improve compile time? If so do you have numbers?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a look.

This overall change is expected to improve the logging of compile-time: mechanically, without this change LLVM_ENABLE_TIMING=1 ignores the entire pre-lower optimization pipeline. No change here is expected to improve compile-time itself, and I don't believe there's a substantive risk of regression, so I did not measure this. Maybe my PR message/title is confusing and made it sound like I expected this to improve compile time :), please let me know if I can express it better.

To the code-change highlighted: The third highlighted line (333) must be moved in order to let the second parameter /*DebuggingLogging*/ true be parameterized by which debugging flags are set. I parameterized it because the debugging information is largely orthogonal from timing, but I didn't want to change the behavior of the previously-used-flag. For the other lines, I suppose it's largely stylistic, so I'd be happy to change it if there's a more "Triton-y" way you see.

Do note, however, it seems required that standardInstr be constructed before the writes to llvm::TimePasses*---this is because those globals are read by the constructor.

- llvm/lib/Passes/StandardInstrumentations.cpp -> invokes the default constructor of its field `TimePassesHandler TimePasses;`
- llvm/lib/IR/PassTimingInfo.cpp -> defines the default constructor of TimePassesHandler as reading those globals.

@agorenstein-nvidia
Copy link
Author

Regarding the pipeline failure, the reported error (https://github.com/triton-lang/triton/actions/runs/18697072695/job/53326475256?pr=8506#step:8:212) indicates a (transient?) network issue:

...
    adding license file 'LICENSE'
    writing manifest file 'python/triton.egg-info/SOURCES.txt'
    running build_ext
    error: <urlopen error [Errno -3] Temporary failure in name resolution>
    downloading and extracting https://developer.download.nvidia.com/compute/cuda/redist/cuda_nvcc/linux-x86_64/cuda_nvcc-linux-x86_64-12.8.93-archive.tar.xz ...
    copy /home/runner/.triton/nvidia/nvcc/cuda_nvcc-linux-x86_64-12.8.93-archive/bin/ptxas to /home/runner/_work/triton/triton/third_party/nvidia/backend/bin/ptxas ...
...

I looked to see if this was merely a knock-on effect of an error from my change, but I didn't see anything in the logs indicating that. Building locally (with pip install -e .) works, is all I can tell on my own machine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants