Skip to content

Commit 3aec5ed

Browse files
committed
Kcpp triage for rowsplit: revert ggml-org#16715 until ggml-org#16799 is resolved
revert ggml-org#16715 (+2 squashed commit) Squashed commit: [289af2ee2] Revert "Hide latency of bias and gate-loading (ggml-org#16847)" This reverts commit 8b11dee. [a3e5c1e95] Revert "CUDA: add unused vars to mmvf and mmvq (ggml-org#16807)" This reverts commit 463bbf2.
1 parent 2649618 commit 3aec5ed

File tree

10 files changed

+166
-959
lines changed

10 files changed

+166
-959
lines changed

ggml/src/ggml-cuda/common.cuh

Lines changed: 0 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1013,16 +1013,3 @@ struct ggml_backend_cuda_context {
10131013
return pool(device);
10141014
}
10151015
};
1016-
1017-
struct ggml_cuda_mm_fusion_args_host {
1018-
const ggml_tensor * x_bias = nullptr;
1019-
const ggml_tensor * gate = nullptr;
1020-
const ggml_tensor * gate_bias = nullptr;
1021-
ggml_glu_op glu_op;
1022-
};
1023-
struct ggml_cuda_mm_fusion_args_device {
1024-
const void * x_bias = nullptr;
1025-
const void * gate = nullptr;
1026-
const void * gate_bias = nullptr;
1027-
ggml_glu_op glu_op;
1028-
};

ggml/src/ggml-cuda/convert.cuh

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
#pragma once
21
#include "common.cuh"
32

43
#define CUDA_DEQUANTIZE_BLOCK_SIZE 256

ggml/src/ggml-cuda/ggml-cuda.cu

Lines changed: 1 addition & 352 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)