Skip to content

Commit 03a0321

Browse files
authored
[Frontend] Avoid inspect.getclosurevars (triton-lang#8831)
This takes `attention_kernel.get_capture_scope()` from 0.5 ms to 47 ns or a little over a 10,000x speedup. I see a combined 250ms compile time improvement in the gluon attention example benchmarks.
1 parent 9b27dff commit 03a0321

File tree

1 file changed

+5
-1
lines changed

1 file changed

+5
-1
lines changed

python/triton/runtime/jit.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -489,7 +489,11 @@ def __init__(self, fn):
489489
self.__module__ = fn.__module__
490490

491491
def get_capture_scope(self):
492-
return self.__globals__ | inspect.getclosurevars(self.fn).nonlocals
492+
fn = self.fn
493+
if fn.__closure__ is None:
494+
return self.__globals__
495+
nonlocals = {name: cell.cell_contents for name, cell in zip(fn.__code__.co_freevars, fn.__closure__)}
496+
return self.__globals__ | nonlocals
493497

494498
@property
495499
def cache_key(self) -> str:

0 commit comments

Comments
 (0)