-
Notifications
You must be signed in to change notification settings - Fork 43
NETOBSERV-2365: Add the ability to create recording rules instead of alerts for the Network Health feature #2112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Skipping CI for Draft Pull Request. |
0c4d303 to
b1fae84
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2112 +/- ##
==========================================
- Coverage 73.24% 73.03% -0.21%
==========================================
Files 82 82
Lines 9339 9431 +92
==========================================
+ Hits 6840 6888 +48
- Misses 2075 2115 +40
- Partials 424 428 +4
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
91777b1 to
ee8bbc2
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2112 +/- ##
==========================================
+ Coverage 73.53% 73.66% +0.12%
==========================================
Files 88 88
Lines 9841 9944 +103
==========================================
+ Hits 7237 7325 +88
- Misses 2162 2177 +15
Partials 442 442
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
| // They are expressed as a percentage of errors above which the alert is triggered. They must be parsable as floats. | ||
| // +required | ||
| Thresholds AlertThresholds `json:"thresholds,omitempty"` | ||
| // Required for alert mode, optional for recording mode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe something I don't get here: how can it be optional for recording mode, if we still want them to appear in the health dashboard, associated with a severity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally thought the recording rule to be just a value, without the notion of a severity, but now that you mention, we can display the value with severities. You're right. Thanks for the feedback.
| // More information on health rules: https://github.com/netobserv/network-observability-operator/blob/main/docs/Alerts.md | ||
| // +optional | ||
| DisableAlerts []AlertTemplate `json:"disableAlerts"` | ||
| DisableHealthRules []HealthRuleTemplate `json:"disableHealthRules"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for disabling, we should keep the existing API and continue to only affect alerts. Disabling is mostly to help users when they have too much alerting noise, and it's not so needed for recording rules
Also because we should not rename this field because it already existed before the TP feature, that would be a breaking change.
| HealthRuleNoFlows HealthRuleTemplate = "NetObservNoFlows" | ||
| HealthRuleLokiError HealthRuleTemplate = "NetObservLokiError" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"NoFlows" and "LokiError" are different from the others, I think we should keep referring to them as alerts only; unlike the others, they are not health items for the dashboard, they're only for alerting.
We should probably add a validation check that they are not used in "recording" mode
|
|
||
| func (rb *ruleBuilder) additionalDescription() string { | ||
| return fmt.Sprintf("You can turn off this alert by adding '%s' to spec.processor.metrics.disableAlerts in FlowCollector, or reconfigure it via spec.processor.metrics.alerts.", rb.template) | ||
| return fmt.Sprintf("You can turn off this health rule by adding '%s' to spec.processor.metrics.disableHealthRules in FlowCollector, or reconfigure it via spec.processor.metrics.healthRules.", rb.template) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(just fyi: we will remove this added description, as it will be moved to the runbooks ; I was about to say something about this message wrt disabled alerts, but we don't care)
|
|
||
| // buildRecordingRuleName builds recording rule name following the convention: | ||
| // netobserv:health:<template>:<groupby>:<side>:rate2m | ||
| func (rb *ruleBuilder) buildRecordingRuleName() string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would prefer if the recording rule name was passed to createRule directly from each alert directly (in alerts.go) ; maybe it's less smart but it's more explicit and make it easier if there's a specific name or something that we want to change for any reason
Actually surely this function can be kept but it's the acronyms and toSnakeCase that IMO can be replaced with an explicit string per alert?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will maybe follow-up later regarding this comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I thought I addressed this one. I will take it into account for a follow-up PR.
| recordName := rb.buildRecordingRuleName() | ||
| return &monitoringv1.Rule{ | ||
| Record: recordName, | ||
| // Note: Recording rules cannot have annotations in Prometheus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really? So we cannot pass any metadata?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only labels: https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/
That would be an option although is not as good as annotations for these kind of information.
|
New images:
They will expire after two weeks. To deploy this build: # Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:cd6dd59 make deploy
# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-cd6dd59Or as a Catalog Source: apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
name: netobserv-dev
namespace: openshift-marketplace
spec:
sourceType: grpc
image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-cd6dd59
displayName: NetObserv development catalog
publisher: Me
updateStrategy:
registryPoll:
interval: 1m |
dd36b0d to
19b3143
Compare
c665679 to
d21a242
Compare
bbb4d69 to
a6ba669
Compare
|
New images:
They will expire after two weeks. To deploy this build: # Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:44ff6a7 make deploy
# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-44ff6a7Or as a Catalog Source: apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
name: netobserv-dev
namespace: openshift-marketplace
spec:
sourceType: grpc
image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-44ff6a7
displayName: NetObserv development catalog
publisher: Me
updateStrategy:
registryPoll:
interval: 1m |
jpinsonneau
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @leandroberetta !
jotak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
4047589 to
c1a990d
Compare
|
New changes are detected. LGTM label has been removed. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |

Description
Add the ability to create recording rules instead of alerts for the Network Health feature.
Instructions to test on console PR.
Dependencies
netobserv/network-observability-console-plugin#1163
Checklist
If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.