Skip to content

Commit 5b4af61

Browse files
committed
fixup! fixup! fixup! fixup! fixup! fixup! fixup! fixup! fixup! fixup! KEP-4785: CRDMetrics Controller
1 parent a1a9a7d commit 5b4af61

File tree

2 files changed

+91
-35
lines changed

2 files changed

+91
-35
lines changed

keps/sig-instrumentation/4785-resource-state-metrics/README.md

Lines changed: 84 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -386,39 +386,46 @@ proposal will be implemented, this is the place to discuss them.
386386
-->
387387

388388
The controller offers a number of improvements over Kube State Metrics' Custom
389-
Resource State API, while maintaining a 3x faster round trip time for metric
389+
Resource State API, while maintaining a [3x faster] round trip time for metric
390390
generation.
391391

392392
- At its core, the controller relies on its managed resource,
393-
`ResourceMetricsMonitor` to fetch the metric generation configuration. Parts of the
394-
configuration may be defined using different `resolver`s, such as `unstructured`
395-
or `CEL`.
393+
`ResourceMetricsMonitor` to fetch the metric generation configuration. Parts
394+
of the configuration may be defined using different `resolver`s, such as
395+
`unstructured` or `CEL`.
396396
- Once fetched, the controller `unmarshal`s the configuration YAML directly into
397-
`stores` which are a set of metric `families`, which in turn are a set of
398-
`metrics`.
399-
- Metric `stores` are created based on its respective GVKR (a type that embeds
400-
`schema.GroupVersionKind`, `schema.GroupVersionResource` to avoid
401-
[plural ambiguities]), and reflectors for the specified resource are
402-
initialized, and populate the stores on its update.
403-
- `/metrics` pings on `RSM_MAIN_PORT` trigger the server to write the
404-
raw metrics, combined with its appropriate header(s), in the response. All
405-
generated metrics are hardcoded to `gauge`s by design, as Prometheus lacks
406-
support for some OpenMetrics-specified metrics' types, such as `Info` and
407-
`StateSets`.
397+
`stores` which are a set of metric `families`, which in turn are a set of
398+
`metrics`.
399+
- Metric `stores` are created based on its respective GVKR (a type that embeds
400+
`schema.GroupVersionKind`, `schema.GroupVersionResource` to avoid [plural
401+
ambiguities]), and reflectors for the specified resource are initialized, and
402+
populate the stores on its update.
403+
- All generated metrics are hardcoded to `gauge`s by design, as Prometheus
404+
currently does not support some OpenMetrics-specified metrics' types, such as
405+
`Info` and `StateSets`, but more importantly, because these metrics can be
406+
expressed using `gauge`s.
407+
- `/metrics` pings on `RSM_MAIN_PORT` trigger the server to write the raw
408+
metrics defined in the configuration, combined with its appropriate header(s),
409+
in the response.
410+
- `/external` pings on `RSM_MAIN_PORT` trigger the server to write the raw
411+
metrics defined in the `./extenal` directory, combined with its appropriate
412+
header(s), in the response.
413+
- `/metrics` pings on `RSM_SELF_PORT` trigger the server to write the raw
414+
metrics about the process itself, combined with its appropriate header(s), in
415+
the response.
408416

409417
At the moment, the `spec` houses a single `configuration` field, which defines
410418
the metric generation configuration as follows (please note that the schema is
411-
fast-moving at this point and may be subject to change based on the [feedback
412-
obtained](https://github.com/rexagod/resource-state-metrics/issues?q=sort%3Aupdated-desc+is%3Aissue+is%3Aopen)):
419+
fast-moving at this point and may be subject to change:
413420

414421
```yaml
415422
stores: # Set of metrics stores for each CR we want to generate metrics for.
416-
- g: "contoso.com" # CR's group.
417-
v: "v1alpha1" # CR's version.
423+
- group: "contoso.com" # CR's group.
424+
version: "v1alpha1" # CR's version.
418425
# Both kind and resource names are required to avoid plural ambiguities, see
419426
# https://github.com/kubernetes-sigs/kubebuilder/issues/3402.
420-
k: "MyPlatform" # CR's kind.
421-
r: "myplatforms" # CR's resource.
427+
kind: "MyPlatform" # CR's kind.
428+
resource: "myplatforms" # CR's resource.
422429
selectors: # Set of filters to narrow down the selected CRs, may be:
423430
field: "metadata.namespace=default" # field selector(s), and (/or),
424431
label: "app.kubernetes.io/part-of=sample-controller" # label selector(s).
@@ -476,6 +483,53 @@ stores: # Set of metrics stores for each CR we want to generate metrics for.
476483
# A non-cast-able `float64` will skip the
477484
# current metric generation and log an error.
478485
```
486+
487+
It's also worth mentioning that unlike Kube State Metrics' Custom Resource
488+
State, Resource State Metrics supports recursively generating samples from
489+
nested data structures, all from a single expression. Assuming we have a query,
490+
```
491+
o.spec
492+
```
493+
for the object,
494+
```yaml
495+
...
496+
spec:
497+
appId: test-sample
498+
language: csharp
499+
os: linux
500+
instanceSize: small
501+
environmentType: dev
502+
tags:
503+
- frontend
504+
- middleware
505+
- backend
506+
features:
507+
- monitoring
508+
- alerting
509+
versions:
510+
- "1.0"
511+
- "2.0"
512+
- "3.0"
513+
- "4.0"
514+
xProps:
515+
nonComposite: "example-value"
516+
compositeArray:
517+
- "value1"
518+
- "value2"
519+
compositeMap:
520+
key1: "value1"
521+
key2: "value2"
522+
```
523+
the resulting metric would look like,
524+
```
525+
test_metric{os="linux", tags="backend", key_1="value1", key_2="value2", app_id="test-sample", features="alerting", language="csharp", versions="1.0", instance_size="small", non_composite="example-value", compositeArray="value1", environment_type="dev"} 2.000000
526+
test_metric{os="linux", tags="frontend", key_1="value1", key_2="value2", app_id="test-sample", features="monitoring", language="csharp", versions="2.0", instance_size="small", non_composite="example-value", compositeArray="value2", environment_type="dev"} 2.000000
527+
test_metric{os="linux", tags="middleware", key_1="value1", key_2="value2", app_id="test-sample", features="", language="csharp", versions="3.0", instance_size="small", non_composite="example-value", compositeArray="", environment_type="dev"} 2.000000
528+
test_metric{os="linux", tags="", key_1="value1", key_2="value2", app_id="test-sample", features="", language="csharp", versions="4.0", instance_size="small", non_composite="example-value", compositeArray="", environment_type="dev"} 2.000000
529+
```
530+
Note that the order of samples, as well as their labelsets, guaranteed to be
531+
stable across runs.
532+
479533
The `status`, on the other hand, is a set of `metav1.Condition`s, like so:
480534

481535
```yaml
@@ -490,8 +544,7 @@ status:
490544
type: Processed
491545
```
492546
493-
Also, performance benchmarks are available in [tests/bench](https://github.com/rexagod/resource-state-metrics/tree/main/tests/bench).
494-
547+
[3x faster]: https://github.com/rexagod/resource-state-metrics/blob/main/tests/bench/bench.sh
495548
[plural ambiguities]: https://github.com/kubernetes-sigs/kubebuilder/issues/3402
496549
497550
### Test Plan
@@ -905,7 +958,7 @@ checking if there are objects with field X set) may be a last resort. Avoid
905958
logs or events for this purpose.
906959
-->
907960
908-
This is not nota workload feature, but an out-of-tree telemetry solution.
961+
This is not a workload feature, but an out-of-tree telemetry solution.
909962
910963
###### How can someone using this feature know that it is working for their instance?
911964
@@ -918,7 +971,7 @@ and operation of this feature.
918971
Recall that end users cannot usually observe component logs or access metrics.
919972
-->
920973
921-
- [x] Events: Events are emitted in `EMIT_NAMESPACE` (defaults to ``), for e.g.,
974+
- [x] Events: Events are emitted in the controller's namespace, for e.g.,
922975
`OwnerRefInvalidNamespace` in case of an owner reference being defined on
923976
`ResourceMetricsMonitor` to its controller.
924977
- [x] API .status: The status for a successfully processed `ResourceMetricsMonitor`
@@ -982,8 +1035,6 @@ TBD.
9821035
This section must be completed when targeting beta to a release.
9831036
-->
9841037

985-
TBD (when targeting beta).
986-
9871038
###### Does this feature depend on any specific services running in the cluster?
9881039

9891040
<!--
@@ -1202,7 +1253,11 @@ information to express the idea and why it was not acceptable.
12021253

12031254
We considered refactoring the Kube State Metrics' Custom Resource State API, but
12041255
that has actually been done multiple times in the past which often amounts to
1205-
us ending up in the same position, owing to its limited scalability.
1256+
us ending up in the same position, owing to its limited scalability. Its also
1257+
worth mentioning [kubernetes/kube-state-metrics#1978] here, an in-house effort
1258+
that had similar goals.
1259+
1260+
[kubernetes/kube-state-metrics#1978]: https://github.com/kubernetes/kube-state-metrics/issues/1978
12061261

12071262
## Infrastructure Needed (Optional)
12081263

@@ -1212,5 +1267,5 @@ new subproject, repos requested, or GitHub details. Listing these here allows a
12121267
SIG to get the process for these resources started right away.
12131268
-->
12141269

1215-
We request a repository (`kubernetes/resource-state-metrics`) to migrate
1270+
We request a repository (`kubernetes-sigs/resource-state-metrics`) to migrate
12161271
`rexagod/resource-state-metrics` to.

keps/sig-instrumentation/4785-resource-state-metrics/kep.yaml

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -8,13 +8,14 @@ participating-sigs:
88
status: provisional
99
creation-date: 2024-08-27
1010
reviewers:
11+
- "@chrischdi"
1112
- "@dgrisonnet"
12-
- "@logicalhan"
1313
- "@mrueg"
1414
- "@richabanker"
15+
- "@sftim"
16+
- "@simonpasquier"
1517
approvers:
1618
- "@dgrisonnet"
17-
- "@logicalhan"
1819
- "@mrueg"
1920
- "@richabanker"
2021

@@ -24,15 +25,15 @@ stage: alpha
2425
# The most recent milestone for which work toward delivery of this KEP has been
2526
# done. This can be the current (upcoming) milestone, if it is being actively
2627
# worked on.
27-
latest-milestone: "" # This is completely external to the k/k tree.
28+
latest-milestone: "" # This is external to the k/k tree.
2829

2930
# The milestone at which this feature was, or is targeted to be, at each stage.
30-
milestone: {} # This is completely external to the k/k tree.
31+
milestone: {} # This is external to the k/k tree.
3132

3233
# The following PRR answers are required at alpha release
3334
# List the feature gate name and the components for which it must be enabled
34-
feature-gates: [] # This is completely external to the k/k tree.
35+
feature-gates: [] # This is external to the k/k tree.
3536
disable-supported: true
3637

3738
# The following PRR answers are required at beta release
38-
metrics: [] # This is completely external to the k/k tree.
39+
metrics: [] # This is external to the k/k tree.

0 commit comments

Comments
 (0)