Skip to content

Commit 97df697

Browse files
authored
[Fix] Update number of available pages after prefix cache free (#2409)
This PR fixes an issue that causes the inconsistency of CanPrefill result from different models.
1 parent d770270 commit 97df697

File tree

1 file changed

+2
-0
lines changed

1 file changed

+2
-0
lines changed

cpp/serve/engine_actions/batch_prefill_base.cc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -109,6 +109,8 @@ BatchPrefillBaseActionObj::GetRequestStateEntriesToPrefill(EngineState estate) {
109109
current_total_seq_len, num_running_rsentries, kv_state_kind,
110110
sliding_window_enabled)) {
111111
if (!estate->prefix_cache->TryFreeMemory()) break;
112+
// Update number of available pages after memory free.
113+
num_available_pages = models_[i]->GetNumAvailablePages();
112114
}
113115
if (CanPrefill(estate, num_prefill_rsentries + 1 + num_child_to_activate,
114116
total_input_length, total_required_pages, num_available_pages,

0 commit comments

Comments
 (0)