Skip to content

Commit 7038e8b

Browse files
authored
[Kernel] Support running GPTQ 8-bit models in Marlin (#4533)
1 parent 2a85f93 commit 7038e8b

File tree

7 files changed

+553
-324
lines changed

7 files changed

+553
-324
lines changed

csrc/ops.h

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -132,6 +132,7 @@ torch::Tensor gptq_marlin_gemm(
132132
torch::Tensor &g_idx,
133133
torch::Tensor &perm,
134134
torch::Tensor &workspace,
135+
int64_t num_bits,
135136
int64_t size_m,
136137
int64_t size_n,
137138
int64_t size_k,
@@ -141,7 +142,8 @@ torch::Tensor gptq_marlin_repack(
141142
torch::Tensor &b_q_weight,
142143
torch::Tensor &perm,
143144
int64_t size_k,
144-
int64_t size_n);
145+
int64_t size_n,
146+
int64_t num_bits);
145147
#endif
146148

147149
void squeezellm_gemm(

0 commit comments

Comments
 (0)