Skip to content

Commit 220860a

Browse files
committed
graph : use F32 accumulators for gpt-oss
ggml-ci
1 parent d32e03f commit 220860a

File tree

1 file changed

+5
-0
lines changed

1 file changed

+5
-0
lines changed

src/llama-graph.cpp

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1566,6 +1566,11 @@ ggml_tensor * llm_graph_context::build_attn_with_sinks(
15661566

15671567
if (wo) {
15681568
cur = build_lora_mm(wo, cur);
1569+
if (arch == LLM_ARCH_OPENAI_MOE) {
1570+
// similar the original build_attn
1571+
// TODO: this is tmp until we refactor and remove the build_attn_with_sinks() path
1572+
ggml_mul_mat_set_prec(cur, GGML_PREC_F32);
1573+
}
15691574
}
15701575

15711576
if (wo_b) {

0 commit comments

Comments
 (0)