Skip to content

Commit 07a5b9b

Browse files
committed
fix benchmark hot path
1 parent c336f77 commit 07a5b9b

File tree

2 files changed

+13
-16
lines changed

2 files changed

+13
-16
lines changed

benchmarks/run.py

Lines changed: 10 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@
2424
import sys
2525
from typing import Any
2626
from typing import Callable
27+
import time
2728

2829
# Maps tritonbench op names to Helion kernel examples
2930
# Can map to a single kernel or a list of kernel variants
@@ -349,21 +350,17 @@ def helion_method(
349350
# so that each input size can go through its own autotuning.
350351
from helion.runtime.kernel import Kernel
351352

352-
for attr_name in dir(mod):
353-
attr = getattr(mod, attr_name)
354-
if isinstance(attr, Kernel):
355-
attr.reset()
353+
# Force autotuning unless HELION_USE_DEFAULT_CONFIG=1 is set
354+
# This ensures we run autotuning even if the kernel has pre-specified configs
355+
if os.environ.get("HELION_USE_DEFAULT_CONFIG", "0") != "1":
356+
# Find all Kernel objects in the module and force autotuning
357+
for attr_name in dir(mod):
358+
attr = getattr(mod, attr_name)
359+
if isinstance(attr, Kernel):
360+
attr.reset()
361+
attr.settings.force_autotune = True
356362

357363
def _inner() -> Callable[..., Any] | object:
358-
# Force autotuning unless HELION_USE_DEFAULT_CONFIG=1 is set
359-
# This ensures we run autotuning even if the kernel has pre-specified configs
360-
if os.environ.get("HELION_USE_DEFAULT_CONFIG", "0") != "1":
361-
# Find all Kernel objects in the module and force autotuning
362-
for attr_name in dir(mod):
363-
attr = getattr(mod, attr_name)
364-
if isinstance(attr, Kernel):
365-
attr.settings.force_autotune = True
366-
367364
result = kfunc(*args)
368365
if callable(result):
369366
return result()

benchmarks/run_input_shard.sh

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,9 +11,9 @@ attempt=0
1111
while true; do
1212
attempt=$((attempt + 1))
1313
echo "Attempt $attempt: Running benchmark for shard $((SHARD+1))/${WORLD_SIZE}..."
14-
15-
HELION_FORCE_DISK_CACHE=1 CUDA_VISIBLE_DEVICES=$((RANK_OFFSET+SHARD)) python benchmarks/run.py --input-shard $((SHARD+1))/${WORLD_SIZE} --metrics tflops,gbps,speedup >"$OUTPUT_FILE" 2>&1
16-
14+
15+
CUDA_VISIBLE_DEVICES=$((RANK_OFFSET+SHARD)) python benchmarks/run.py --input-shard $((SHARD+1))/${WORLD_SIZE} --metrics tflops,gbps,speedup >"$OUTPUT_FILE" 2>&1
16+
1717
exit_code=$?
1818
if [ $exit_code -eq 0 ]; then
1919
echo "Success! Benchmark completed for shard $((SHARD+1))/${WORLD_SIZE}"

0 commit comments

Comments
 (0)