Skip to content

Commit adefa98

Browse files
committed
Set GLM4 blk.*.attn_output.weight, kqv_out-* matmul to GGML_PREC_F32 to fix infinity values in output
1 parent 8960efd commit adefa98

File tree

1 file changed

+4
-0
lines changed

1 file changed

+4
-0
lines changed

src/llama-graph.cpp

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1488,6 +1488,10 @@ ggml_tensor * llm_graph_context::build_attn(
14881488

14891489
if (wo) {
14901490
cur = build_lora_mm(wo, cur);
1491+
if (arch == LLM_ARCH_GLM4) {
1492+
// GLM4 seems to have numerical issues with half-precision accumulators
1493+
ggml_mul_mat_set_prec(cur, GGML_PREC_F32);
1494+
}
14911495
}
14921496

14931497
if (wo_b) {

0 commit comments

Comments
 (0)