Skip to content

Commit c0389db

Browse files
authored
CANN: Disable acl_graph for prefill stage (ggml-org#15933)
Since the prefill length is not fixed, graphs constructed for the prefill stage cannot be reused. For this reason, ACL graph execution is disabled by default during prefill.
1 parent 00681df commit c0389db

File tree

2 files changed

+19
-0
lines changed

2 files changed

+19
-0
lines changed

docs/backend/CANN.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -318,3 +318,7 @@ Operators are executed using ACL graph execution, rather than in op-by-op (eager
318318
### GGML_CANN_GRAPH_CACHE_CAPACITY
319319

320320
Maximum number of compiled CANN graphs kept in the LRU cache, default is 12. When the number of cached graphs exceeds this capacity, the least recently used graph will be evicted.
321+
322+
### GGML_CANN_PREFILL_USE_GRAPH
323+
324+
Enable ACL graph execution during the prefill stage, default is false. This option is only effective when FA is enabled.

ggml/src/ggml-cann/ggml-cann.cpp

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2360,6 +2360,21 @@ static enum ggml_status ggml_backend_cann_graph_compute(
23602360
bool use_cann_graph = true;
23612361
bool cann_graph_update_required = false;
23622362

2363+
static bool prefill_use_graph = parse_bool(get_env("GGML_CANN_PREFILL_USE_GRAPH").value_or(""));
2364+
if (!prefill_use_graph) {
2365+
// Do not use acl_graph for prefill.
2366+
for (int i = 0; i < cgraph->n_nodes; i++) {
2367+
ggml_tensor * node = cgraph->nodes[i];
2368+
// TODO: Optimize here. Currently, we can only
2369+
// get seq_len by FA's input.
2370+
if (node->op == GGML_OP_FLASH_ATTN_EXT) {
2371+
// Q -> src[0], shape: [B, S, N, D]
2372+
use_cann_graph = (node->src[0]->ne[1] == 1);
2373+
break;
2374+
}
2375+
}
2376+
}
2377+
23632378
if (!cann_ctx->acl_graph_mode) {
23642379
use_cann_graph = false;
23652380
}

0 commit comments

Comments
 (0)