-
Notifications
You must be signed in to change notification settings - Fork 8
add kcp operator custom metrics #121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
add kcp operator custom metrics #121
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
62caaa4 to
a282340
Compare
mjudeikis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
|
||
| func (mc *MetricsCollector) updateRootShardCounts(ctx context.Context) { | ||
| var rootShards operatorv1alpha1.RootShardList | ||
| if err := mc.client.List(ctx, &rootShards); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use cache (all below too) here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might not get the actual metric values if they're cached, and besides, I think the Controller-runtime already implements caching by default
| var ( | ||
| RootShardCount = prometheus.NewGaugeVec( | ||
| prometheus.GaugeOpts{ | ||
| Name: "kcp_operator_rootshard_count", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why this is just (I think this is open to discussion @embik )
kcp_operator_object_count{type=rootshard}
Status of these rootshard, shardl frontproxy cache server should be same ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would save us a bit of cardinality:
currently:
4 metrics * 2 dimension = 8
suggested:
1 metrcic * 3 dimensions = 3.
Kubeconfig and certificates would stay separate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am entirely unconvinced that a _count metric is even useful. Users could simple count() other metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this can be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have other metrics with one instance per object, then I'd say yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still a bit verbose. One metrics with counts with component types would enough.
I am entirely unconvinced that a _count metric is even useful. Users could simple count() other metrics
Yes, but it's easier for the user than having an explicit metric. Otherwise, they need to go and "reverse engineer" what they need to count to get the desired result.
|
|
||
| ConditionStatus = prometheus.NewGaugeVec( | ||
| prometheus.GaugeOpts{ | ||
| Name: "kcp_operator_condition_status", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what this metric is about. Need better docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shows the status of objects installed.
For example, you can have something like this.
kcp_operator_condition_status{condition_type="Available",namespace="default",resource_name="frontproxy-sample",resource_type="frontproxy"} 1
kcp_operator_condition_status{condition_type="Available",namespace="default",resource_name="shard-sample",resource_type="rootshard"} 0
kcp_operator_condition_status{condition_type="RootShard",namespace="default",resource_name="frontproxy-sample",resource_type="frontproxy"} 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldbe fine with one metric for the conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldbe fine with one metric for the conditions.
This is not too clear. Does it mean you're okay with the current implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep.
9886ccb to
8c170fc
Compare
8c170fc to
d5ce156
Compare
|
/retest |
mjudeikis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still same few comments
| var ( | ||
| RootShardCount = prometheus.NewGaugeVec( | ||
| prometheus.GaugeOpts{ | ||
| Name: "kcp_operator_rootshard_count", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still a bit verbose. One metrics with counts with component types would enough.
I am entirely unconvinced that a _count metric is even useful. Users could simple count() other metrics
Yes, but it's easier for the user than having an explicit metric. Otherwise, they need to go and "reverse engineer" what they need to count to get the desired result.
Signed-off-by: olalekan odukoya <[email protected]>
d5ce156 to
903116d
Compare
Summary
What Type of PR Is This?
Related Issue(s)
Fixes 25
Release Notes