Skip to content

Commit 814e08c

Browse files
committed
fix(scheduling): query "/" to check if a runner is ready
The llama.cpp server returns an error if the model is still loading: https://github.com/ggml-org/llama.cpp/blob/459c0c2c1a400f960d7b8e8d94d31a8426f80986/tools/server/server.cpp#L4220. Wait for it to be loaded using the correct endpoint, as on /models it doesn't return 503. Signed-off-by: Dorin Geman <dorin.geman@docker.com>
1 parent 9f27104 commit 814e08c

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

pkg/inference/scheduling/runner.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -205,7 +205,7 @@ func (r *runner) wait(ctx context.Context) error {
205205
default:
206206
}
207207
// Create and execute a request targeting a known-valid endpoint.
208-
readyRequest, err := http.NewRequestWithContext(ctx, http.MethodGet, "http://localhost/v1/models", http.NoBody)
208+
readyRequest, err := http.NewRequestWithContext(ctx, http.MethodGet, "http://localhost/", http.NoBody)
209209
if err != nil {
210210
return fmt.Errorf("readiness request creation failed: %w", err)
211211
}

0 commit comments

Comments
 (0)