Cache NvidiaTool.from_path (#7569)

robieta · web-flow · commit 167ed2866c56 · 2025-07-21T15:57:41.000Z
We're currently updating the internal Triton version at Meta, and one of
the things that process has flushed out is that there has been some
regression in compile time since 3.2. One of the main culprits is
`NvidiaTool.from_path`; we call it all the time under the hood, and each
time it spawns a subprocess just to check the version. The benchmark I'm
using is a torch.compiled'd model with ~300 kernels, and this change
saves about 10% off of the overall compile time.

I don't have strong feelings on how defensive to be here. I picked a
middle-of-the road level where I guard `PATH`, but don't go crazy
guarding against every conceivable level of magic. (I think the existing
knobs unit tests are pretty representative of what one might expect of
"reasonable" behavior, but let me know if you'd like me to tweak the
level of defensiveness.)
diff --git a/python/triton/knobs.py b/python/triton/knobs.py
@@ -1,5 +1,6 @@
 from __future__ import annotations
 
+import functools
 import importlib
 import os
 import re
@@ -170,6 +171,7 @@ class NvidiaTool:
     version: str
 
     @staticmethod
+    @functools.lru_cache
     def from_path(path: str) -> Optional[NvidiaTool]:
         try:
             result = subprocess.check_output([path, "--version"], stderr=subprocess.STDOUT)