Skip to content

Commit ada4e6e

Browse files
authored
[BENCHMARK] Enable two Deepseek-v3 cases which couldn't run previously. (#4722)
The Deepseek cases can run successfully now since the RemoveLayout pass has been improved for flex attn. Signed-off-by: Lu,Chengjun <[email protected]>
1 parent 428e2b1 commit ada4e6e

File tree

1 file changed

+4
-5
lines changed

1 file changed

+4
-5
lines changed

benchmarks/triton_kernels_benchmark/flex_attention_benchmark_causal_mask.py

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -88,10 +88,7 @@ def causal_mask(_, __, q_idx, kv_idx):
8888
8989
# Multi-query attention. H_kv equals 1.
9090
# Append shapes of Deepseek-v3 (Nope)
91-
[
92-
# RuntimeError: No valid triton configs. OutOfResources: out of resource: shared memory, Required: 133120, Hardware limit: 131072.
93-
# [z, 128, 1, 512, 1024 + 128 + 512, 64, 512, 'fwd'] for z in batch_sizes
94-
] +
91+
[[z, 128, 1, 512, 1024 + 128 + 512, 64, 512, 'fwd'] for z in batch_sizes] +
9592
# Append shapes of Deepseek-v3 (Rope)
9693
[] +
9794
@@ -121,7 +118,9 @@ def causal_mask(_, __, q_idx, kv_idx):
121118
] +
122119
# Decode shapes of Deepseek-v3 (Nope)
123120
[
124-
# RuntimeError: No valid triton configs. OutOfResources: out of resource: shared memory, Required: 264192, Hardware limit: 131072.
121+
# There is an known issue in IGC for kernel with extreme register pressure.
122+
# Enable this case later with new IGC.
123+
# RuntimeError: ZE_RESULT_ERROR_INVALID_KERNEL_NAME
125124
# [z, 128, 1, 1, 1024, 64, 512, 'fwd'] for z in batch_sizes
126125
] +
127126
# Decode shapes of Deepseek-v3 (Rope)

0 commit comments

Comments
 (0)