Skip to content

Commit 45afaf3

Browse files
ikawrakowIwan Kawrakow
andauthored
Fix #772 (#790)
Co-authored-by: Iwan Kawrakow <[email protected]>
1 parent 8cd2d7c commit 45afaf3

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

ggml/src/ggml-cuda/fattn-mma-f16.cuh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1408,7 +1408,7 @@ void launch_fattn_mma(
14081408

14091409
//const bool use_stream_k = cc >= CC_ADA_LOVELACE || tiles_efficiency_percent < 75;
14101410
// On my RTX-4080 the above is slightly slower for PP. It would be useful to try and see what happens on Blackwell
1411-
const bool use_stream_k = tiles_efficiency_percent < 75;
1411+
const bool use_stream_k = tiles_efficiency_percent < 75 || Q->ne[1] > 2048;
14121412

14131413
blocks_num.x = use_stream_k ? nblocks_stream_k : ntiles_total;
14141414
blocks_num.y = 1;

0 commit comments

Comments
 (0)