Skip to content

Commit 5010c4a

Browse files
authored
Merge pull request #1851 from rexagod/1848
Allow optional VK in CR metrics
2 parents 5e4cd54 + 25a1d8d commit 5010c4a

File tree

20 files changed

+1305
-116
lines changed

20 files changed

+1305
-116
lines changed

docs/customresourcestate-metrics.md

Lines changed: 43 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -284,14 +284,14 @@ kube_customresource_uptime{customresource_group="myteam.io", customresource_kind
284284
##### Type conversion and special handling
285285

286286
Gauges produce values of type float64 but custom resources can be of all kinds of types.
287-
Kube-state-metrics performs implicity type conversions for a lot of type.
287+
Kube-state-metrics performs implicit type conversions for a lot of type.
288288
Supported types are:
289289

290290
* (u)int32/64, int, float32 and byte are cast to float64
291-
* `nil` is generally mapped to `0.0` if NilIsZero is `true`. Otherwise it yields an error
291+
* `nil` is generally mapped to `0.0` if NilIsZero is `true`, otherwise it will throw an error
292292
* for bool `true` is mapped to `1.0` and `false` is mapped to `0.0`
293293
* for string the following logic applies
294-
* `"true"` and `"yes"` are mapped to `1.0` and `"false"` and `"no"` are mapped to `0.0` (all case insensitive)
294+
* `"true"` and `"yes"` are mapped to `1.0` and `"false"` and `"no"` are mapped to `0.0` (all case-insensitive)
295295
* RFC3339 times are parsed to float timestamp
296296
* Quantities like "250m" or "512Gi" are parsed to float using https://github.com/kubernetes/apimachinery/blob/master/pkg/api/resource/quantity.go
297297
* Percentages ending with a "%" are parsed to float
@@ -416,7 +416,7 @@ spec:
416416
metricNamePrefix: myteam_foos
417417
metrics:
418418
- name: uptime
419-
...
419+
# ...
420420
```
421421

422422
Produces:
@@ -434,7 +434,7 @@ spec:
434434
metricNamePrefix: ""
435435
metrics:
436436
- name: uptime
437-
...
437+
# ...
438438
```
439439

440440
Produces:
@@ -481,3 +481,41 @@ Examples:
481481
# if the value to be matched is a number or boolean, the value is compared as a number or boolean
482482
[status, conditions, "[value=66]", name] # status.conditions[1].name = "b"
483483
```
484+
485+
### Wildcard matching of version and kind fields
486+
487+
The Custom Resource State (CRS hereon) configuration also allows you to monitor all versions and/or kinds that come under a group. It watches
488+
the installed CRDs for this purpose. Taking the aforementioned `Foo` object as reference, the configuration below allows
489+
you to monitor all objects under all versions *and* all kinds that come under the `myteam.io` group.
490+
491+
```yaml
492+
kind: CustomResourceStateMetrics
493+
spec:
494+
resources:
495+
- groupVersionKind:
496+
group: "myteam.io"
497+
version: "*" # Set to `v1 to monitor all kinds under `myteam.io/v1`. Wildcard matches all installed versions that come under this group.
498+
kind: "*" # Set to `Foo` to monitor all `Foo` objects under the `myteam.io` group (under all versions). Wildcard matches all installed kinds that come under this group (and version, if specified).
499+
metrics:
500+
- name: "myobject_info"
501+
help: "Foo Bar Baz"
502+
each:
503+
type: Info
504+
info:
505+
path: [metadata]
506+
labelsFromPath:
507+
object: [name]
508+
namespace: [namespace]
509+
```
510+
511+
The configuration above produces these metrics.
512+
513+
```yaml
514+
kube_customresource_myobject_info{customresource_group="myteam.io",customresource_kind="Foo",customresource_version="v1",namespace="ns",object="foo"} 1
515+
kube_customresource_myobject_info{customresource_group="myteam.io",customresource_kind="Bar",customresource_version="v1",namespace="ns",object="bar"} 1
516+
```
517+
518+
#### Note
519+
520+
- For cases where the GVKs defined in a CRD have multiple versions under a single group for the same kind, as expected, the wildcard value will resolve to *all* versions, but a query for any specific version will return all resources under all versions, in that versions' representation. This basically means that for two such versions `A` and `B`, if a resource exists under `B`, it will reflect in the metrics generated for `A` as well, in addition to any resources of itself, and vice-versa. This logic is based on the [current `list`ing behavior](https://github.com/kubernetes/client-go/issues/1251#issuecomment-1544083071) of the client-go library.
521+
- The introduction of this feature further discourages (and discontinues) the use of native objects in the CRS featureset, since these do not have an explicit CRD associated with them, and conflict with internal stores defined specifically for such native resources. Please consider opening an issue or raising a PR if you'd like to expand on the current metric labelsets for them. Also, any such configuration will be ignored, and no metrics will be generated for the same.

internal/discovery/discovery.go

Lines changed: 279 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,279 @@
1+
/*
2+
Copyright 2023 The Kubernetes Authors All rights reserved.
3+
Licensed under the Apache License, Version 2.0 (the "License");
4+
you may not use this file except in compliance with the License.
5+
You may obtain a copy of the License at
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
Unless required by applicable law or agreed to in writing, software
8+
distributed under the License is distributed on an "AS IS" BASIS,
9+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
10+
See the License for the specific language governing permissions and
11+
limitations under the License.
12+
*/
13+
14+
// Package discovery provides a discovery and resolution logic for GVKs.
15+
package discovery
16+
17+
import (
18+
"context"
19+
"fmt"
20+
"time"
21+
22+
"k8s.io/apimachinery/pkg/apis/meta/v1/unstructured"
23+
"k8s.io/apimachinery/pkg/runtime/schema"
24+
"k8s.io/client-go/dynamic"
25+
"k8s.io/client-go/dynamic/dynamicinformer"
26+
"k8s.io/client-go/rest"
27+
"k8s.io/client-go/tools/cache"
28+
"k8s.io/klog/v2"
29+
30+
"k8s.io/kube-state-metrics/v2/internal/store"
31+
"k8s.io/kube-state-metrics/v2/pkg/customresource"
32+
"k8s.io/kube-state-metrics/v2/pkg/metricshandler"
33+
"k8s.io/kube-state-metrics/v2/pkg/options"
34+
"k8s.io/kube-state-metrics/v2/pkg/util"
35+
)
36+
37+
// Interval is the time interval between two cache sync checks.
38+
const Interval = 3 * time.Second
39+
40+
// StartDiscovery starts the discovery process, fetching all the objects that can be listed from the apiserver, every `Interval` seconds.
41+
// resolveGVK needs to be called after StartDiscovery to generate factories.
42+
func (r *CRDiscoverer) StartDiscovery(ctx context.Context, config *rest.Config) error {
43+
client := dynamic.NewForConfigOrDie(config)
44+
factory := dynamicinformer.NewFilteredDynamicInformer(client, schema.GroupVersionResource{
45+
Group: "apiextensions.k8s.io",
46+
Version: "v1",
47+
Resource: "customresourcedefinitions",
48+
}, "", 0, nil, nil)
49+
informer := factory.Informer()
50+
stopper := make(chan struct{})
51+
_, err := informer.AddEventHandler(cache.ResourceEventHandlerFuncs{
52+
AddFunc: func(obj interface{}) {
53+
objSpec := obj.(*unstructured.Unstructured).Object["spec"].(map[string]interface{})
54+
for _, version := range objSpec["versions"].([]interface{}) {
55+
g := objSpec["group"].(string)
56+
v := version.(map[string]interface{})["name"].(string)
57+
k := objSpec["names"].(map[string]interface{})["kind"].(string)
58+
p := objSpec["names"].(map[string]interface{})["plural"].(string)
59+
gotGVKP := groupVersionKindPlural{
60+
GroupVersionKind: schema.GroupVersionKind{
61+
Group: g,
62+
Version: v,
63+
Kind: k,
64+
},
65+
Plural: p,
66+
}
67+
r.AppendToMap(gotGVKP)
68+
r.SafeWrite(func() {
69+
r.WasUpdated = true
70+
})
71+
}
72+
r.SafeWrite(func() {
73+
r.CRDsAddEventsCounter.Inc()
74+
r.CRDsCacheCountGauge.Inc()
75+
})
76+
},
77+
DeleteFunc: func(obj interface{}) {
78+
objSpec := obj.(*unstructured.Unstructured).Object["spec"].(map[string]interface{})
79+
for _, version := range objSpec["versions"].([]interface{}) {
80+
g := objSpec["group"].(string)
81+
v := version.(map[string]interface{})["name"].(string)
82+
k := objSpec["names"].(map[string]interface{})["kind"].(string)
83+
p := objSpec["names"].(map[string]interface{})["plural"].(string)
84+
gotGVKP := groupVersionKindPlural{
85+
GroupVersionKind: schema.GroupVersionKind{
86+
Group: g,
87+
Version: v,
88+
Kind: k,
89+
},
90+
Plural: p,
91+
}
92+
r.RemoveFromMap(gotGVKP)
93+
r.SafeWrite(func() {
94+
r.WasUpdated = true
95+
})
96+
}
97+
r.SafeWrite(func() {
98+
r.CRDsDeleteEventsCounter.Inc()
99+
r.CRDsCacheCountGauge.Dec()
100+
})
101+
},
102+
})
103+
if err != nil {
104+
return err
105+
}
106+
// Respect context cancellation.
107+
go func() {
108+
for range ctx.Done() {
109+
klog.InfoS("context cancelled, stopping discovery")
110+
close(stopper)
111+
return
112+
}
113+
}()
114+
go informer.Run(stopper)
115+
return nil
116+
}
117+
118+
// ResolveGVKToGVKPs resolves the variable VKs to a GVK list, based on the current cache.
119+
func (r *CRDiscoverer) ResolveGVKToGVKPs(gvk schema.GroupVersionKind) (resolvedGVKPs []groupVersionKindPlural, err error) { // nolint:revive
120+
g := gvk.Group
121+
v := gvk.Version
122+
k := gvk.Kind
123+
if g == "" || g == "*" {
124+
return nil, fmt.Errorf("group is required in the defined GVK %v", gvk)
125+
}
126+
hasVersion := v != "" && v != "*"
127+
hasKind := k != "" && k != "*"
128+
// No need to resolve, return.
129+
if hasVersion && hasKind {
130+
var p string
131+
for _, el := range r.Map[g][v] {
132+
if el.Kind == k {
133+
p = el.Plural
134+
break
135+
}
136+
}
137+
return []groupVersionKindPlural{
138+
{
139+
GroupVersionKind: schema.GroupVersionKind{
140+
Group: g,
141+
Version: v,
142+
Kind: k,
143+
},
144+
Plural: p,
145+
},
146+
}, nil
147+
}
148+
if hasVersion && !hasKind {
149+
kinds := r.Map[g][v]
150+
for _, el := range kinds {
151+
resolvedGVKPs = append(resolvedGVKPs, groupVersionKindPlural{
152+
GroupVersionKind: schema.GroupVersionKind{
153+
Group: g,
154+
Version: v,
155+
Kind: el.Kind,
156+
},
157+
Plural: el.Plural,
158+
})
159+
}
160+
}
161+
if !hasVersion && hasKind {
162+
versions := r.Map[g]
163+
for version, kinds := range versions {
164+
for _, el := range kinds {
165+
if el.Kind == k {
166+
resolvedGVKPs = append(resolvedGVKPs, groupVersionKindPlural{
167+
GroupVersionKind: schema.GroupVersionKind{
168+
Group: g,
169+
Version: version,
170+
Kind: k,
171+
},
172+
Plural: el.Plural,
173+
})
174+
}
175+
}
176+
}
177+
}
178+
if !hasVersion && !hasKind {
179+
versions := r.Map[g]
180+
for version, kinds := range versions {
181+
for _, el := range kinds {
182+
resolvedGVKPs = append(resolvedGVKPs, groupVersionKindPlural{
183+
GroupVersionKind: schema.GroupVersionKind{
184+
Group: g,
185+
Version: version,
186+
Kind: el.Kind,
187+
},
188+
Plural: el.Plural,
189+
})
190+
}
191+
}
192+
}
193+
return
194+
}
195+
196+
// PollForCacheUpdates polls the cache for updates and updates the stores accordingly.
197+
func (r *CRDiscoverer) PollForCacheUpdates(
198+
ctx context.Context,
199+
opts *options.Options,
200+
storeBuilder *store.Builder,
201+
m *metricshandler.MetricsHandler,
202+
factoryGenerator func() ([]customresource.RegistryFactory, error),
203+
) {
204+
// The interval at which we will check the cache for updates.
205+
t := time.NewTicker(Interval)
206+
// Track previous context to allow refreshing cache.
207+
olderContext, olderCancel := context.WithCancel(ctx)
208+
// Prevent context leak (kill the last metric handler instance).
209+
defer olderCancel()
210+
generateMetrics := func() {
211+
// Get families for discovered factories.
212+
customFactories, err := factoryGenerator()
213+
if err != nil {
214+
klog.ErrorS(err, "failed to update custom resource stores")
215+
}
216+
// Update the list of enabled custom resources.
217+
var enabledCustomResources []string
218+
for _, factory := range customFactories {
219+
gvrString := util.GVRFromType(factory.Name(), factory.ExpectedType()).String()
220+
enabledCustomResources = append(enabledCustomResources, gvrString)
221+
}
222+
// Create clients for discovered factories.
223+
discoveredCustomResourceClients, err := util.CreateCustomResourceClients(opts.Apiserver, opts.Kubeconfig, customFactories...)
224+
if err != nil {
225+
klog.ErrorS(err, "failed to update custom resource stores")
226+
}
227+
// Update the store builder with the new clients.
228+
storeBuilder.WithCustomResourceClients(discoveredCustomResourceClients)
229+
// Inject families' constructors to the existing set of stores.
230+
storeBuilder.WithCustomResourceStoreFactories(customFactories...)
231+
// Update the store builder with the new custom resources.
232+
if err := storeBuilder.WithEnabledResources(enabledCustomResources); err != nil {
233+
klog.ErrorS(err, "failed to update custom resource stores")
234+
}
235+
// Configure the generation function for the custom resource stores.
236+
storeBuilder.WithGenerateCustomResourceStoresFunc(storeBuilder.DefaultGenerateCustomResourceStoresFunc())
237+
// Reset the flag, if there were no errors. Else, we'll try again on the next tick.
238+
// Keep retrying if there were errors.
239+
r.SafeWrite(func() {
240+
r.WasUpdated = false
241+
})
242+
// Run the metrics handler with updated configs.
243+
olderContext, olderCancel = context.WithCancel(ctx)
244+
go func() {
245+
// Blocks indefinitely until the unbuffered context is cancelled to serve metrics for that duration.
246+
err = m.Run(olderContext)
247+
if err != nil {
248+
// Check if context was cancelled.
249+
select {
250+
case <-olderContext.Done():
251+
// Context cancelled, don't really need to log this though.
252+
default:
253+
klog.ErrorS(err, "failed to run metrics handler")
254+
}
255+
}
256+
}()
257+
}
258+
go func() {
259+
for range t.C {
260+
select {
261+
case <-ctx.Done():
262+
klog.InfoS("context cancelled")
263+
t.Stop()
264+
return
265+
default:
266+
// Check if cache has been updated.
267+
shouldGenerateMetrics := false
268+
r.SafeRead(func() {
269+
shouldGenerateMetrics = r.WasUpdated
270+
})
271+
if shouldGenerateMetrics {
272+
olderCancel()
273+
generateMetrics()
274+
klog.InfoS("discovery finished, cache updated")
275+
}
276+
}
277+
}
278+
}()
279+
}

0 commit comments

Comments
 (0)