-
Notifications
You must be signed in to change notification settings - Fork 164
fix(BA-3800): Endpoint status does not consider DEGRADED routes when determining DEGRADED status
#7870
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…n determining `DEGRADED` status
| if healthy_service_count == 0: | ||
| return EndpointStatus.UNHEALTHY | ||
| if unhealthy_service_count > 0: | ||
| problematic_service_count = len([ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think unreachable_service_count sounds better than problematic_service_count.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think TERMINATED, FAILED_TO_START route is also "unreachable", so it seems ambiguous to name this variable 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a bug where endpoint status resolution did not consider routes with DEGRADED status when determining if an endpoint should have DEGRADED status. Previously, only UNHEALTHY routes were checked, causing endpoints with DEGRADED routes to incorrectly show PROVISIONING status instead of DEGRADED.
Key changes:
- Updated endpoint status resolution logic to check for both
UNHEALTHYandDEGRADEDroutes - Renamed
unhealthy_service_counttoproblematic_service_countfor better clarity - Added test coverage for the scenario with mixed healthy and degraded routes
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/ai/backend/manager/api/gql_legacy/endpoint.py | Updated resolve_status method to include DEGRADED routes in the check for problematic routes, and renamed the counter variable for clarity |
| tests/unit/manager/api/endpoint/test_types.py | Added test case to verify that endpoints with both healthy and degraded routes correctly return DEGRADED status |
| changes/7870.fix.md | Added changelog entry for the fix |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -0,0 +1 @@ | |||
| Endpoint status does not consider `DEGRADED` routes when determining `DEGRADED` status | |||
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changelog filename is 7870.fix.md but the PR description mentions resolving issue #7869. The filename should match the issue number being resolved. Please rename this file to 7869.fix.md to maintain consistency.
resolves #7869 (BA-3800)
Checklist: (if applicable)
ai.backend.testdocsdirectory