Skip to content

Conversation

@jopemachine
Copy link
Member

@jopemachine jopemachine commented Jan 8, 2026

resolves #7869 (BA-3800)

Checklist: (if applicable)

  • Milestone metadata specifying the target backport version
  • Mention to the original issue
  • Installer updates including:
    • Fixtures for db schema changes
    • New mandatory config options
  • Update of end-to-end CLI integration tests in ai.backend.test
  • API server-client counterparts (e.g., manager API -> client SDK)
  • Test case(s) to:
    • Demonstrate the difference of before/after
    • Demonstrate the flow of abstract/conceptual models with a concrete implementation
  • Documentation
    • Contents in the docs directory
    • docstrings in public interfaces and type annotations

@github-actions github-actions bot added size:S 10~30 LoC comp:manager Related to Manager component labels Jan 8, 2026
@jopemachine jopemachine added this to the 26.1 milestone Jan 8, 2026
@jopemachine jopemachine marked this pull request as ready for review January 8, 2026 06:04
Copilot AI review requested due to automatic review settings January 8, 2026 06:04
@github-actions github-actions bot added size:M 30~100 LoC and removed size:S 10~30 LoC labels Jan 8, 2026
if healthy_service_count == 0:
return EndpointStatus.UNHEALTHY
if unhealthy_service_count > 0:
problematic_service_count = len([
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think unreachable_service_count sounds better than problematic_service_count.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think TERMINATED, FAILED_TO_START route is also "unreachable", so it seems ambiguous to name this variable 🤔

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a bug where endpoint status resolution did not consider routes with DEGRADED status when determining if an endpoint should have DEGRADED status. Previously, only UNHEALTHY routes were checked, causing endpoints with DEGRADED routes to incorrectly show PROVISIONING status instead of DEGRADED.

Key changes:

  • Updated endpoint status resolution logic to check for both UNHEALTHY and DEGRADED routes
  • Renamed unhealthy_service_count to problematic_service_count for better clarity
  • Added test coverage for the scenario with mixed healthy and degraded routes

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
src/ai/backend/manager/api/gql_legacy/endpoint.py Updated resolve_status method to include DEGRADED routes in the check for problematic routes, and renamed the counter variable for clarity
tests/unit/manager/api/endpoint/test_types.py Added test case to verify that endpoints with both healthy and degraded routes correctly return DEGRADED status
changes/7870.fix.md Added changelog entry for the fix

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -0,0 +1 @@
Endpoint status does not consider `DEGRADED` routes when determining `DEGRADED` status
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changelog filename is 7870.fix.md but the PR description mentions resolving issue #7869. The filename should match the issue number being resolved. Please rename this file to 7869.fix.md to maintain consistency.

Copilot uses AI. Check for mistakes.
@HyeockJinKim HyeockJinKim added this pull request to the merge queue Jan 8, 2026
@HyeockJinKim HyeockJinKim removed this pull request from the merge queue due to a manual request Jan 8, 2026
@HyeockJinKim HyeockJinKim added this pull request to the merge queue Jan 9, 2026
Merged via the queue into main with commit 4d689dd Jan 9, 2026
38 checks passed
@HyeockJinKim HyeockJinKim deleted the fix/BA-3800 branch January 9, 2026 04:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp:manager Related to Manager component size:M 30~100 LoC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Endpoint status does not consider DEGRADED routes when determining DEGRADED status

4 participants