Skip to content

Commit c1b9116

Browse files
authored
server: fix router mode deadlock on child crash and TOCTOU race in models_max (#20763)
Two bugs in `server_models::load()` that affect router mode reliability: **Bug 1: Deadlock when child process crashes** When a child process is killed (e.g., SIGKILL from OS code signature validation), the monitoring thread deadlocks on `stopping_thread.join()` because the stopping_thread's wait predicate (`is_stopping`) is never satisfied — the model name was never inserted into `stopping_models`. `update_status()` is never reached and the model stays stuck in LOADING state permanently. Fix: extend the stopping_thread's wait predicate to also wake when the child process is no longer alive (`!subprocess_alive()`). When woken by a dead child, the thread skips the shutdown sequence and returns immediately. The original `stopping_models.erase()` logic is preserved for normal unloads. **Bug 2: TOCTOU race bypasses `--models-max` (ref #20137)** `unload_lru()` is called outside the mutex, then `load()` acquires the lock afterward. Under concurrent requests, multiple threads observe capacity and all proceed to load, exceeding the limit. Fix: re-check capacity under the lock after `unload_lru()` returns. If another thread filled the slot in the window between `unload_lru()` and the lock acquisition, reject with an error instead of silently exceeding the limit.
1 parent b739738 commit c1b9116

File tree

1 file changed

+25
-2
lines changed

1 file changed

+25
-2
lines changed

tools/server/server-models.cpp

Lines changed: 25 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -539,6 +539,22 @@ void server_models::load(const std::string & name) {
539539
return;
540540
}
541541

542+
// Re-check capacity under the lock to prevent concurrent loads from
543+
// exceeding models_max. Without this, the window between unload_lru()
544+
// releasing its lock and this lock_guard acquiring allows multiple
545+
// threads to each observe capacity and all proceed to load.
546+
if (base_params.models_max > 0) {
547+
size_t count_active = 0;
548+
for (const auto & m : mapping) {
549+
if (m.second.meta.is_active()) {
550+
count_active++;
551+
}
552+
}
553+
if (count_active >= (size_t)base_params.models_max) {
554+
throw std::runtime_error("model limit reached, try again later");
555+
}
556+
}
557+
542558
// prepare new instance info
543559
instance_t inst;
544560
inst.meta = meta;
@@ -606,13 +622,20 @@ void server_models::load(const std::string & name) {
606622
});
607623

608624
std::thread stopping_thread([&]() {
609-
// thread to monitor stopping signal
625+
// thread to monitor stopping signal OR child crash
610626
auto is_stopping = [this, &name]() {
611627
return this->stopping_models.find(name) != this->stopping_models.end();
612628
};
629+
auto should_wake = [&]() {
630+
return is_stopping() || !subprocess_alive(child_proc.get());
631+
};
613632
{
614633
std::unique_lock<std::mutex> lk(this->mutex);
615-
this->cv_stop.wait(lk, is_stopping);
634+
this->cv_stop.wait(lk, should_wake);
635+
}
636+
// child may have already exited (e.g. crashed) — skip shutdown sequence
637+
if (!subprocess_alive(child_proc.get())) {
638+
return;
616639
}
617640
SRV_INF("stopping model instance name=%s\n", name.c_str());
618641
// send interrupt to child process

0 commit comments

Comments
 (0)