Skip to content

Commit 5dd08e1

Browse files
authored
Rename instruction_sched_hint to schedule_hint (#768)
1 parent 8e42af9 commit 5dd08e1

File tree

5 files changed

+86
-86
lines changed

5 files changed

+86
-86
lines changed

python/perf-kernels/tools/rocm-triton-prof/README.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -36,12 +36,12 @@ The `flash-attention.py` kernel comes with auto-tuning. In this example, we want
3636

3737
```bash
3838
$ TRITON_PRINT_AUTOTUNING=1 python3 ./flash-attention.py -b 2 -hq 16 -hk 16 -sq 8192 -sk 8192 -d 128 -causal -layout thd
39-
Autotuning kernel attn_fwd with config BLOCK_M: 128, BLOCK_N: 128, waves_per_eu: 2, PRE_LOAD_V: False, GRID_CU_MULTIP: 2, instruction_sched_variant: none, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
40-
Autotuning kernel attn_fwd with config BLOCK_M: 128, BLOCK_N: 64, waves_per_eu: 2, PRE_LOAD_V: False, GRID_CU_MULTIP: 2, instruction_sched_variant: none, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
41-
Autotuning kernel attn_fwd with config BLOCK_M: 128, BLOCK_N: 64, waves_per_eu: 3, PRE_LOAD_V: False, GRID_CU_MULTIP: 2, instruction_sched_variant: none, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
42-
Autotuning kernel attn_fwd with config BLOCK_M: 128, BLOCK_N: 64, waves_per_eu: 1, PRE_LOAD_V: False, GRID_CU_MULTIP: 2, instruction_sched_variant: none, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
43-
Autotuning kernel attn_fwd with config BLOCK_M: 128, BLOCK_N: 32, waves_per_eu: 2, PRE_LOAD_V: False, GRID_CU_MULTIP: 2, instruction_sched_variant: none, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
44-
Triton autotuning for function attn_fwd finished after 15.06s; best config selected: BLOCK_M: 128, BLOCK_N: 64, waves_per_eu: 2, PRE_LOAD_V: False, GRID_CU_MULTIP: 2, instruction_sched_variant: none, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None;
39+
Autotuning kernel attn_fwd with config BLOCK_M: 128, BLOCK_N: 128, waves_per_eu: 2, PRE_LOAD_V: False, GRID_CU_MULTIP: 2, schedule_hint: none, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
40+
Autotuning kernel attn_fwd with config BLOCK_M: 128, BLOCK_N: 64, waves_per_eu: 2, PRE_LOAD_V: False, GRID_CU_MULTIP: 2, schedule_hint: none, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
41+
Autotuning kernel attn_fwd with config BLOCK_M: 128, BLOCK_N: 64, waves_per_eu: 3, PRE_LOAD_V: False, GRID_CU_MULTIP: 2, schedule_hint: none, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
42+
Autotuning kernel attn_fwd with config BLOCK_M: 128, BLOCK_N: 64, waves_per_eu: 1, PRE_LOAD_V: False, GRID_CU_MULTIP: 2, schedule_hint: none, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
43+
Autotuning kernel attn_fwd with config BLOCK_M: 128, BLOCK_N: 32, waves_per_eu: 2, PRE_LOAD_V: False, GRID_CU_MULTIP: 2, schedule_hint: none, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None
44+
Triton autotuning for function attn_fwd finished after 15.06s; best config selected: BLOCK_M: 128, BLOCK_N: 64, waves_per_eu: 2, PRE_LOAD_V: False, GRID_CU_MULTIP: 2, schedule_hint: none, num_warps: 4, num_ctas: 1, num_stages: 1, maxnreg: None;
4545
fused-attention-fwd-d128-layoutthd:
4646
BATCH HQ HK N_CTX_Q N_CTX_K triton torch
4747
0 2.0 16.0 16.0 8192.0 8192.0 221.869662 17.140226

0 commit comments

Comments
 (0)