You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
// Only api.MatchLabels is supported for now and the requirements
382
382
// are ANDed. MatchExpressions is not supported yet.
383
383
map<string, string> label_selector = 2;
@@ -511,18 +511,18 @@ The table above describes the various metrics that are in this endpoint.
511
511
Each compliant CRI implementation must:
512
512
- Have a location broadcasted about where these metrics can be gathered from. The endpoint name must not necessarily be `/metrics/cadvisor`, nor be gathererd from the same port as it was from cAdvisor
513
513
- Implement *all* metrics within the set of metrics that are decided on.
514
-
- **TODO** How will we decide this set? We could support all, or take polls from the community and come up with a set of sufficiently useful metrics.
514
+
-**TODO** How will we decide this set? We could support all, or take polls from the community and come up with a set of sufficiently useful metrics.
515
515
- Pass a set of tests in the critest suite that verify they report the correct values for *all* supported metrics labels (to ensure continued conformance and standardization).
516
516
517
517
Below is the proposed strategy for doing so:
518
518
519
519
1. The Alpha release will strictly cover research, performance testing and the creation of conformance tests.
520
-
- Initial research on the set of metrics required should be done. This will, possibly, allow the community to declare metrics that are not required to be moved to the CRI implementations.
521
-
- Testing on how performant cAdvisor+Kubelet are today should be done, to find a target, acceptable threshold of performance for the CRI implementations
522
-
- Creation of tests verifying the metrics are reported correctly should be created and verified with the existing cAdvisor implementation.
520
+
- Initial research on the set of metrics required should be done. This will, possibly, allow the community to declare metrics that are not required to be moved to the CRI implementations.
521
+
- Testing on how performant cAdvisor+Kubelet are today should be done, to find a target, acceptable threshold of performance for the CRI implementations
522
+
- Creation of tests verifying the metrics are reported correctly should be created and verified with the existing cAdvisor implementation.
523
523
2. For the Beta release, add initial support for CRI implementations to report these metrics
524
-
- This set of metrics will be based on the research done in alpha
525
-
- Each will be validated against the conformance and performance tests created in alpha.
524
+
- This set of metrics will be based on the research done in alpha
525
+
- Each will be validated against the conformance and performance tests created in alpha.
526
526
3. For the GA release, the CRI implementation should be the source of truth for all pod and container level metrics that external parties rely on (no matter how many endpoints the Kubelet advertises).
527
527
528
528
#### cAdvisor
@@ -568,7 +568,7 @@ As a requirement for the Beta stage, cAdvisor must support optionally collecting
568
568
### Version Skew Strategy
569
569
570
570
- Breaking changes between versions will be mitigated by the FeatureGate.
571
-
- By the time the FeatureGate is deprecated, it is expected the transition between CRI and cAdvisor is complete, and CRI has had at least one release to expose the required metrics (to allow for `n-1` CRI skew).
571
+
- By the time the FeatureGate is deprecated, it is expected the transition between CRI and cAdvisor is complete, and CRI has had at least one release to expose the required metrics (to allow for `n-1` CRI skew).
572
572
- In general, CRI should be updated in tandem with or before the Kubelet.
573
573
574
574
## Production Readiness Review Questionnaire
@@ -725,13 +725,13 @@ operations covered by [existing SLIs/SLOs]?**
725
725
Think about adding additional work or introducing new steps in between
726
726
(e.g. need to do X to start a container), etc. Please describe the details.
727
727
- The process of collecting and reporting the metrics should not differ too much between cAdvisor and the CRI implementation:
728
-
- At a high level, both need to watch the changes to the stats (from cgroups, disk and network stats)
729
-
- Once collected, the CRI implementation will need to report them (both through the CRI and eventually through the prometheus endpoint).
730
-
- Both of these steps are already done by cAdvisor, so the work is changing hands, but not fundamentally changing.
728
+
- At a high level, both need to watch the changes to the stats (from cgroups, disk and network stats)
729
+
- Once collected, the CRI implementation will need to report them (both through the CRI and eventually through the prometheus endpoint).
730
+
- Both of these steps are already done by cAdvisor, so the work is changing hands, but not fundamentally changing.
731
731
- It is possible the Alpha iteration of this KEP may affect CPU/memory usage on the node:
732
732
- This may come because cAdvisor's performance has been fine-tuned, and changing the location of work may loose some optimizations.
733
-
- However, it is explicitly stated that a requirement for transition from Alpha->Beta is little to no performance degradation.
734
-
- The existence of the feature gate will allow users to mitigate this potential blip in performance (by not opting-in).
733
+
- However, it is explicitly stated that a requirement for transition from Alpha->Beta is little to no performance degradation.
734
+
- The existence of the feature gate will allow users to mitigate this potential blip in performance (by not opting-in).
735
735
***Will enabling / using this feature result in non-negligible increase of
736
736
resource usage (CPU, RAM, disk, IO, ...) in any components?**
737
737
- It most likely will reduce resource utilization. Right now, there is duplicate work being done between CRI and cAdvisor.
@@ -768,6 +768,6 @@ Note: This is by design as this will enable to decouple runtime implementation d
768
768
## Alternatives
769
769
770
770
- Instead of teaching CRI how to do *everything* cAdvisor does, we could instead have cAdvisor not do the work the CRI stats end up doing (specifically when reporting disk stats, which are the most expensive operation to report).
771
-
- However, this doesn't address the anti-pattern of having multiple parties confusingly responsible for a wide array of metrics and other issues described.
771
+
- However, this doesn't address the anti-pattern of having multiple parties confusingly responsible for a wide array of metrics and other issues described.
772
772
- Have cAdvisor implement the summary API. A cAdvisor daemonset could be a drop-in replacement for the summary API.
773
773
- Don't keep supporting the summary API. Replace it with a "better" format, like prometheus. Or help users migrate to equivalent APIs that container runtimes already expose for monitoring.
0 commit comments