-
Notifications
You must be signed in to change notification settings - Fork 260
feat: add metric for NNC init failures #3453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Evan Baker <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
Comments suppressed due to low confidence (1)
cns/service/metrics.go:9
- Typo in comment: 'monotic' should be 'monotonic'.
// managerStartFailures is a monotic counter which tracks the number of times the controller-runtime
|
/azp run Azure Container Networking PR |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Overview
This pull request updates the CNS initialization process to retry until successful, tracking failures and success via new metrics. Key changes include:
- Replacing a finite retry count with an infinite (until succeeded) exponential backoff retrier in the main service.
- Incrementing a failure metric (nncInitFailure) on each NNC init failure and setting a success gauge (hasNNCInitialized) after successful reconciliation.
- Adding two new Prometheus metrics in metrics.go for NNC initialization tracking.
Reviewed Changes
| File | Description |
|---|---|
| cns/service/main.go | Updated retry logic and added metric instrumentation for CNS init state |
| cns/service/metrics.go | Added two new metrics (nncInitFailure and hasNNCInitialized) with registration |
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
Comments suppressed due to low confidence (2)
cns/service/main.go:1462
- The variable 'initCNSInitalDelay' appears to have a typo; consider renaming it to 'initCNSInitialDelay' for clarity.
}, retry.Context(ctx), retry.Delay(initCNSInitalDelay), retry.MaxDelay(time.Minute), retry.UntilSucceeded())
cns/service/metrics.go:29
- The word 'monotic' in the comment appears to be a typo; consider changing it to 'monotonic'.
// nncInitFailure is a monotic counter which tracks the number of times the initial NNC reconcile has failed.
|
This pull request is stale because it has been open for 2 weeks with no activity. Remove stale label or comment or this will be closed in 7 days |
|
Pull request closed due to inactivity. |
|
This pull request is stale because it has been open for 2 weeks with no activity. Remove stale label or comment or this will be closed in 7 days |
|
/azp run Azure Container Networking PR |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
This pull request is stale because it has been open for 2 weeks with no activity. Remove stale label or comment or this will be closed in 7 days |
|
This pull request is stale because it has been open for 2 weeks with no activity. Remove stale label or comment or this will be closed in 7 days |
|
Pull request closed due to inactivity. |
Pull request was closed
|
This pull request is stale because it has been open for 2 weeks with no activity. Remove stale label or comment or this will be closed in 7 days |
Signed-off-by: Evan Baker <[email protected]>
Instead of crashing after 10 retries to initialize the CNS state, this change retries until it succeeds and increments a metric if it doesn't to count NNC init failures. Also adds a positive-signal metric "hasNNCInitialized" to signal that this process has completed succesfully.