Revert "Revert "fix(scheduling): query "/" to check if a runner is ready""#174
Conversation
Reviewer's guide (collapsed on small PRs)Reviewer's GuideThis PR reverts a previous revert to restore the readiness probe by querying the root endpoint (“/”) instead of “/v1/models”, adjusting the HTTP request in runner.go to correctly detect when the llama.cpp server is still loading. File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Pull Request Overview
This pull request reverts a previous revert, restoring a fix that changes the health check endpoint from /v1/models to / when determining if a runner is ready. The change addresses an issue where the llama.cpp server returns different HTTP status codes depending on the endpoint when the model is still loading.
- Changes the readiness check endpoint from
/v1/modelsto/to properly detect when a runner is ready - Ensures the health check receives a 503 status code (instead of 404) when the model is still loading
- Restores functionality that was working correctly but was temporarily reverted for testing
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Summary of ChangesHello @doringeman, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request restores a critical fix for the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request reverts a previous revert, effectively re-introducing a change that uses the "/" endpoint for the HTTP readiness probe instead of "/v1/models". The change aims to correctly detect when a runner is loading a model, as the "/models" endpoint doesn't return a 503 status during the loading process. The review focuses on ensuring the correctness of the endpoint used for the readiness probe.
Reverts #173 which reverted #170.
I reverted it because suddenly it stopped working, as in it was returning 404 instead of 503 and I wanted to make sure I correctly test it and not leave it on
mainlike that. I was getting 404 because I was testing it with DD'slatesttagged llama.cpp, which is expected to return 404.From #170:
In order to test this, run it, send a request to big model so it's getting loaded and look for
level=info msg="srv log_server_r: request: GET / 503" component=llama.cppin the logs.Summary by Sourcery
Bug Fixes: