Skip to content

Commit ad688e1

Browse files
ikawrakowIwan Kawrakow
andauthored
Use fused gemv+add only for TG (#933)
Co-authored-by: Iwan Kawrakow <[email protected]>
1 parent db3bed2 commit ad688e1

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

ggml/src/ggml-cuda.cu

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2067,7 +2067,7 @@ static int ggml_cuda_mul_mat_q(ggml_backend_cuda_context & ctx, const ggml_tenso
20672067

20682068
auto stream = ctx.stream();
20692069

2070-
auto fusion = ctx.fusion;
2070+
auto fusion = ctx.fusion && src1->ne[1] == 1;
20712071

20722072
auto ne10_padded = GGML_PAD(src1->ne[0], MATRIX_ROW_PADDING);
20732073
auto nb10_padded = ne10_padded*sizeof(block_q8_1)/QK8_1;

0 commit comments

Comments
 (0)