Skip to content

Commit 42cb7bd

Browse files
authored
fix(llama-cpp): populate tensor_buft_override buffer so llama-cpp properly performs fit calculations (#8560)
fix auto-fit for llama-cpp
1 parent 2fb9940 commit 42cb7bd

File tree

1 file changed

+6
-0
lines changed

1 file changed

+6
-0
lines changed

backend/cpp/llama-cpp/grpc-server.cpp

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -417,6 +417,12 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt
417417
// n_ctx_checkpoints: max context checkpoints per slot (default: 8)
418418
params.n_ctx_checkpoints = 8;
419419

420+
// llama memory fit fails if we don't provide a buffer for tensor overrides
421+
const size_t ntbo = llama_max_tensor_buft_overrides();
422+
while (params.tensor_buft_overrides.size() < ntbo) {
423+
params.tensor_buft_overrides.push_back({nullptr, nullptr});
424+
}
425+
420426
// decode options. Options are in form optname:optvale, or if booleans only optname.
421427
for (int i = 0; i < request->options_size(); i++) {
422428
std::string opt = request->options(i);

0 commit comments

Comments
 (0)