From 35c8fbeab512681ec37a4c1427c71fb212723c2a Mon Sep 17 00:00:00 2001 From: Omer Aplatony Date: Sat, 6 Sep 2025 08:15:01 +0000 Subject: [PATCH 1/4] update AEP Signed-off-by: Omer Aplatony --- .../README.md | 126 ++++++++++++++---- 1 file changed, 102 insertions(+), 24 deletions(-) diff --git a/vertical-pod-autoscaler/enhancements/8026-per-vpa-component-configuration/README.md b/vertical-pod-autoscaler/enhancements/8026-per-vpa-component-configuration/README.md index 5375c9f532cb..3e7add2bae29 100644 --- a/vertical-pod-autoscaler/enhancements/8026-per-vpa-component-configuration/README.md +++ b/vertical-pod-autoscaler/enhancements/8026-per-vpa-component-configuration/README.md @@ -10,7 +10,11 @@ - [Container Policy Parameters](#container-policy-parameters) - [Update Policy Parameters](#update-policy-parameters) - [Design Details](#design-details) + - [Configuration Level Considerations](#configuration-level-considerations) + - [Parameter Level Analysis](#parameter-level-analysis) - [API Changes](#api-changes) + - [OOM Parameter Consolidation](#oom-parameter-consolidation) + - [Parameter Coexistence and Global Configuration](#parameter-coexistence-and-global-configuration) - [Phase 1 (Current Proposal)](#phase-1-current-proposal) - [Future Extensions](#future-extensions) - [Feature Enablement and Rollback](#feature-enablement-and-rollback) @@ -44,7 +48,6 @@ Currently, supporting these different needs requires running multiple VPA compon - Support different optimization strategies for different workloads in the same cluster - Maintain backward compatibility with existing global configuration - Initially support the following parameters: - - oomBumpUpRatio - oomMinBumpUp - memoryAggregationInterval - memoryAggregationIntervalCount @@ -64,20 +67,19 @@ The configuration will be split into two sections: container-specific recommenda apiVersion: autoscaling.k8s.io/v1 kind: VerticalPodAutoscaler metadata: - name: oom-test-vpa + name: simple-vpa spec: targetRef: apiVersion: apps/v1 kind: Deployment - name: oom-test + name: my-app updatePolicy: updateMode: Auto evictAfterOOMThreshold: "5m" resourcePolicy: containerPolicies: - containerName: "*" - oomBumpUpRatio: "1.5" - oomMinBumpUp: 104857600 + oomBumpUp: "1.5" memoryAggregationInterval: "12h" memoryAggregationIntervalCount: 5 ``` @@ -85,20 +87,14 @@ spec: ### Parameter Descriptions #### Container Policy Parameters -#### Container Policy Parameters -* `oomBumpUpRatio` (Quantity): - - Multiplier applied to memory recommendations after OOM events - - Represented as a Quantity (e.g., "1.5") - - Must be greater than or equal to 1 - - Setting to 1 effectively disables the OOM ratio-based increase - - Controls how aggressively memory is increased after container crashes - -* `oomMinBumpUp` (bytes): - - Minimum absolute memory increase after OOM events - - Setting to 0 effectively disables the OOM minimum increase - - When both `oomBumpUpRatio` = 1 and `oomMinBumpUp` = 0, OOM-based memory increases are completely disabled - - Ensures meaningful increases even for small containers - +* `oomBumpUp` (Quantity): + - Controls memory increase after OOM events + - Can be specified as either: + - A ratio (e.g., "1.5" for 50% increase) + - An absolute value (e.g., "100Mi" for fixed increase) + - When specified, overrides global OOM configuration + - Allows fine-tuning of OOM behavior per container + * `memoryAggregationInterval` (duration): - Time window for aggregating memory usage data - Affects how quickly VPA responds to memory usage changes @@ -117,13 +113,94 @@ Each parameter can be configured independently, falling back to global defaults ## Design Details +### Configuration Level Considerations + +When designing the configuration parameters, we analyzed each parameter to determine the most appropriate configuration level: + +#### Parameter Level Analysis + +1. **OOM-Related Parameters (`oomBumpUp`)** +- **Recommended Level**: Container-level +- **Rationale**: + - Different containers in the same pod may have different memory requirements and OOM patterns + - Memory recommendations in VPA are already handled at container level + - Consistent with how VPA handles other resource-related configurations + - While `oomBumpUp` is container-level, VPA-wide configuration can still be achieved using the wildcard container name "*" + +2. **Memory Aggregation Parameters (`memoryAggregationInterval`, `memoryAggregationIntervalCount`)** +- **Recommended Level**: Container-level +- **Rationale**: + - Different containers may have different memory usage patterns + - Some containers might need faster reaction times to memory changes + - Consistent with existing VPA container-level resource policies + - While `memoryAggregationInterval` and `memoryAggregationIntervalCount` are container-level, VPA-wide configuration can still be achieved using the wildcard container name "*" + +3. **Eviction Parameters (`evictAfterOOMThreshold`)** +- **Recommended Level**: VPA-level +- **Rationale**: + - Eviction decisions affect the entire pod + - Pod-level operation that shouldn't vary by container + - Simpler operational model for pod lifecycle management + - Consistent with how Kubernetes handles pod evictions + + ### API Changes +### OOM Parameter Consolidation + +In this proposal, we are consolidating `oomBumpUpRatio` and `oomMinBumpUp` into a single `oomBumpUp` parameter for several reasons: +1. **Simplified Configuration**: + - Instead of managing two separate parameters that affect the same behavior, users can specify their intent with a single parameter + - Makes configuration more intuitive and less error-prone + +2. **Flexible Expression**: + - `oomBumpUp` can express both ratio-based and absolute increases: + - Ratio format: "1.5" (50% increase) + - Absolute format: "100Mi" (100 MiB increase) + - This flexibility eliminates the need for separate parameters while maintaining all functionality + +3. **Consistent with Kubernetes Patterns**: + - Uses the standard Kubernetes Quantity type + - Similar to how other Kubernetes components handle resource specifications + +4. **Backward Compatibility**: + - The new parameter can express all use cases covered by the previous parameters + - Migration path is straightforward: + - For ratio-based increases: Use the same value in `oomBumpUp` + - For minimum increases: Convert to absolute quantity in `oomBumpUp` + +### Parameter Coexistence and Global Configuration + +The VPA recommender maintains global configuration values for OOM-related parameters. The new `oomBumpUp` parameter interacts with these global configurations as follows: + +1. **Priority Order**: + - If `oomBumpUp` is specified in the VPA object, it takes precedence over any global configuration + - If `oomBumpUp` is not specified, the global configuration values from the recommender will be used + - This allows for a smooth transition and maintains backward compatibility + +2. **Validation**: + - The admission controller can only validate the syntax and range of `oomBumpUp` when specified + - Cannot validate against global configuration as these values live in the recommender + - Values must be either: + - A ratio >= 1 (e.g., "1.5") + - An absolute memory value > 0 (e.g., "100Mi") + +3. **Example Scenarios**: + ```yaml + # Scenario 1: oomBumpUp specified - overrides global config + containerPolicies: + - containerName: "*" + oomBumpUp: "1.5" + + # Scenario 2: No oomBumpUp - uses global config from recommender + containerPolicies: + - containerName: "*" + + #### Phase 1 (Current Proposal) Extend `ContainerResourcePolicy` with: -* `oomBumpUpRatio` -* `oomMinBumpUp` +* `oomBumpUp` * `memoryAggregationInterval` * `memoryAggregationIntervalCount` @@ -175,15 +252,15 @@ The `PerVPAConfig` feature requires VPA version 1.5.0 or higher. The feature is ### Validation via CEL and Testing Initial validation rules (CEL): -* `oomMinBumpUp` >= 0 +* `oomBumpUP` >= 1 * `memoryAggregationInterval` > 0 -* `evictAfterOOMThreshold` > 0 * `memoryAggregationIntervalCount` > 0 +* `evictAfterOOMThreshold` > 0 Validation via Admission Controller: Some components cann't be validated using Common Expression Language (CEL). This validation is performed within the admission controller. -* `oomBumpUpRatio` – Using Kubernetes Quantity type for validation. The value must be greater than or equal to 1. +* `oomBumpUP` – Using Kubernetes Quantity type for validation. The value must be greater than or equal to 1. Additional validation rules will be added as new parameters are introduced. E2E tests will be included to verify: @@ -201,6 +278,7 @@ E2E tests will be included to verify: ## Implementation History - 2025-04-12: Initial proposal +- 2025-09-06: Consolidated `oomBumpUpRatio` and `oomMinBumpUp` into single `oomBumpUp` parameter - Future: Additional parameters will be added based on user feedback and requirements ## Future Work From a7bf2c85705c86ec66e43f10e31b8b1a7955bc97 Mon Sep 17 00:00:00 2001 From: Omer Aplatony Date: Sat, 6 Sep 2025 14:10:13 +0000 Subject: [PATCH 2/4] Revert to old configurations Signed-off-by: Omer Aplatony --- .../README.md | 88 +++++-------------- 1 file changed, 22 insertions(+), 66 deletions(-) diff --git a/vertical-pod-autoscaler/enhancements/8026-per-vpa-component-configuration/README.md b/vertical-pod-autoscaler/enhancements/8026-per-vpa-component-configuration/README.md index 3e7add2bae29..fa9cc0882f4e 100644 --- a/vertical-pod-autoscaler/enhancements/8026-per-vpa-component-configuration/README.md +++ b/vertical-pod-autoscaler/enhancements/8026-per-vpa-component-configuration/README.md @@ -48,6 +48,7 @@ Currently, supporting these different needs requires running multiple VPA compon - Support different optimization strategies for different workloads in the same cluster - Maintain backward compatibility with existing global configuration - Initially support the following parameters: + - oomBumpUpRatio - oomMinBumpUp - memoryAggregationInterval - memoryAggregationIntervalCount @@ -79,7 +80,8 @@ spec: resourcePolicy: containerPolicies: - containerName: "*" - oomBumpUp: "1.5" + oomBumpUpRatio: "1.5" + oomMinBumpUp: 104857600 memoryAggregationInterval: "12h" memoryAggregationIntervalCount: 5 ``` @@ -87,14 +89,18 @@ spec: ### Parameter Descriptions #### Container Policy Parameters -* `oomBumpUp` (Quantity): - - Controls memory increase after OOM events - - Can be specified as either: - - A ratio (e.g., "1.5" for 50% increase) - - An absolute value (e.g., "100Mi" for fixed increase) - - When specified, overrides global OOM configuration - - Allows fine-tuning of OOM behavior per container - +* `oomBumpUpRatio` (Quantity): + - Multiplier applied to memory recommendations after OOM events + - Represented as a Quantity (e.g., "1.5") + - Must be greater than or equal to 1 + - Setting to 1 effectively disables the OOM ratio-based increase + - Controls how aggressively memory is increased after container crashes + +* `oomMinBumpUp` (bytes): + - Minimum absolute memory increase after OOM events + - Setting to 0 effectively disables the OOM minimum increase + - When both `oomBumpUpRatio` = 1 and `oomMinBumpUp` = 0, OOM-based memory increases are completely disabled + - Ensures meaningful increases even for small containers * `memoryAggregationInterval` (duration): - Time window for aggregating memory usage data - Affects how quickly VPA responds to memory usage changes @@ -119,13 +125,13 @@ When designing the configuration parameters, we analyzed each parameter to deter #### Parameter Level Analysis -1. **OOM-Related Parameters (`oomBumpUp`)** +1. **OOM-Related Parameters (`oomBumpUpRatio`, `oomMinBumpUp`)** - **Recommended Level**: Container-level - **Rationale**: - Different containers in the same pod may have different memory requirements and OOM patterns - Memory recommendations in VPA are already handled at container level - Consistent with how VPA handles other resource-related configurations - - While `oomBumpUp` is container-level, VPA-wide configuration can still be achieved using the wildcard container name "*" + - While these parameters are container-level, VPA-wide configuration can still be achieved using the wildcard container name "*" 2. **Memory Aggregation Parameters (`memoryAggregationInterval`, `memoryAggregationIntervalCount`)** - **Recommended Level**: Container-level @@ -146,61 +152,11 @@ When designing the configuration parameters, we analyzed each parameter to deter ### API Changes -### OOM Parameter Consolidation - -In this proposal, we are consolidating `oomBumpUpRatio` and `oomMinBumpUp` into a single `oomBumpUp` parameter for several reasons: -1. **Simplified Configuration**: - - Instead of managing two separate parameters that affect the same behavior, users can specify their intent with a single parameter - - Makes configuration more intuitive and less error-prone - -2. **Flexible Expression**: - - `oomBumpUp` can express both ratio-based and absolute increases: - - Ratio format: "1.5" (50% increase) - - Absolute format: "100Mi" (100 MiB increase) - - This flexibility eliminates the need for separate parameters while maintaining all functionality - -3. **Consistent with Kubernetes Patterns**: - - Uses the standard Kubernetes Quantity type - - Similar to how other Kubernetes components handle resource specifications - -4. **Backward Compatibility**: - - The new parameter can express all use cases covered by the previous parameters - - Migration path is straightforward: - - For ratio-based increases: Use the same value in `oomBumpUp` - - For minimum increases: Convert to absolute quantity in `oomBumpUp` - -### Parameter Coexistence and Global Configuration - -The VPA recommender maintains global configuration values for OOM-related parameters. The new `oomBumpUp` parameter interacts with these global configurations as follows: - -1. **Priority Order**: - - If `oomBumpUp` is specified in the VPA object, it takes precedence over any global configuration - - If `oomBumpUp` is not specified, the global configuration values from the recommender will be used - - This allows for a smooth transition and maintains backward compatibility - -2. **Validation**: - - The admission controller can only validate the syntax and range of `oomBumpUp` when specified - - Cannot validate against global configuration as these values live in the recommender - - Values must be either: - - A ratio >= 1 (e.g., "1.5") - - An absolute memory value > 0 (e.g., "100Mi") - -3. **Example Scenarios**: - ```yaml - # Scenario 1: oomBumpUp specified - overrides global config - containerPolicies: - - containerName: "*" - oomBumpUp: "1.5" - - # Scenario 2: No oomBumpUp - uses global config from recommender - containerPolicies: - - containerName: "*" - - #### Phase 1 (Current Proposal) Extend `ContainerResourcePolicy` with: -* `oomBumpUp` +* `oomBumpUpRatio` +* `oomMinBumpUp` * `memoryAggregationInterval` * `memoryAggregationIntervalCount` @@ -252,7 +208,7 @@ The `PerVPAConfig` feature requires VPA version 1.5.0 or higher. The feature is ### Validation via CEL and Testing Initial validation rules (CEL): -* `oomBumpUP` >= 1 +* `oomMinBumpUp` >= 0 * `memoryAggregationInterval` > 0 * `memoryAggregationIntervalCount` > 0 * `evictAfterOOMThreshold` > 0 @@ -260,7 +216,7 @@ Initial validation rules (CEL): Validation via Admission Controller: Some components cann't be validated using Common Expression Language (CEL). This validation is performed within the admission controller. -* `oomBumpUP` – Using Kubernetes Quantity type for validation. The value must be greater than or equal to 1. +* `oomBumpUpRatio` – Using Kubernetes Quantity type for validation. The value must be greater than or equal to 1. Additional validation rules will be added as new parameters are introduced. E2E tests will be included to verify: @@ -278,7 +234,7 @@ E2E tests will be included to verify: ## Implementation History - 2025-04-12: Initial proposal -- 2025-09-06: Consolidated `oomBumpUpRatio` and `oomMinBumpUp` into single `oomBumpUp` parameter +- 2025-09-06: Specify `oomBumpUpRatio` and `oomMinBumpUp` as container-level paramters - Future: Additional parameters will be added based on user feedback and requirements ## Future Work From 6a99eb410846ad6ec8b1cbcbc96d91760d74265a Mon Sep 17 00:00:00 2001 From: Omer Aplatony Date: Sat, 6 Sep 2025 14:16:39 +0000 Subject: [PATCH 3/4] fixed typo Signed-off-by: Omer Aplatony --- .../enhancements/8026-per-vpa-component-configuration/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/vertical-pod-autoscaler/enhancements/8026-per-vpa-component-configuration/README.md b/vertical-pod-autoscaler/enhancements/8026-per-vpa-component-configuration/README.md index fa9cc0882f4e..c14511a0d709 100644 --- a/vertical-pod-autoscaler/enhancements/8026-per-vpa-component-configuration/README.md +++ b/vertical-pod-autoscaler/enhancements/8026-per-vpa-component-configuration/README.md @@ -234,7 +234,7 @@ E2E tests will be included to verify: ## Implementation History - 2025-04-12: Initial proposal -- 2025-09-06: Specify `oomBumpUpRatio` and `oomMinBumpUp` as container-level paramters +- 2025-09-06: Specify `oomBumpUpRatio` and `oomMinBumpUp` as container-level parameters - Future: Additional parameters will be added based on user feedback and requirements ## Future Work From bad21386dadd91dffc6e3b3897e7baa03b3b1fef Mon Sep 17 00:00:00 2001 From: Omer Aplatony Date: Wed, 24 Sep 2025 10:16:26 +0000 Subject: [PATCH 4/4] Add more info Signed-off-by: Omer Aplatony --- .../README.md | 50 ++++++++++++++----- 1 file changed, 37 insertions(+), 13 deletions(-) diff --git a/vertical-pod-autoscaler/enhancements/8026-per-vpa-component-configuration/README.md b/vertical-pod-autoscaler/enhancements/8026-per-vpa-component-configuration/README.md index c14511a0d709..260466eeceb8 100644 --- a/vertical-pod-autoscaler/enhancements/8026-per-vpa-component-configuration/README.md +++ b/vertical-pod-autoscaler/enhancements/8026-per-vpa-component-configuration/README.md @@ -31,7 +31,7 @@ ## Summary -Currently, VPA components (recommender, updater, admission controller) are configured through global flags. This makes it challenging to support different workloads with varying resource optimization needs within the same cluster. This proposal introduces the ability to specify configuration parameters at the individual VPA object level, allowing for workload-specific optimization strategies. +Currently, VPA components (recommender, updater, admission controller) are configured through global flags. This makes it challenging to support different workloads with varying resource optimization needs within the same cluster. This proposal introduces the ability to specify configuration parameters at the individual VPA object level, allowing for workload-specific optimization strategies. The goal is not to introduce new configuration options but rather to make existing internal configurations accessible and customizable per VPA object. ## Motivation @@ -89,18 +89,42 @@ spec: ### Parameter Descriptions #### Container Policy Parameters -* `oomBumpUpRatio` (Quantity): - - Multiplier applied to memory recommendations after OOM events - - Represented as a Quantity (e.g., "1.5") - - Must be greater than or equal to 1 - - Setting to 1 effectively disables the OOM ratio-based increase - - Controls how aggressively memory is increased after container crashes - -* `oomMinBumpUp` (bytes): - - Minimum absolute memory increase after OOM events - - Setting to 0 effectively disables the OOM minimum increase - - When both `oomBumpUpRatio` = 1 and `oomMinBumpUp` = 0, OOM-based memory increases are completely disabled - - Ensures meaningful increases even for small containers + +* `oomBumpUpRatio` and `oomMinBumpUp`: + These parameters work together to determine memory increases after OOM events. + The VPA selects the larger of: + - The absolute increase specified by `oomMinBumpUp` + - A relative increase calculated using `oomBumpUpRatio` + - If both are set to their neutral values (`oomBumpUpRatio = 1`, `oomMinBumpUp = 0`), OOM-based increases are disabled. + + This ensures: + - Small containers receive a guaranteed minimum bump (via `oomMinBumpUp`, e.g., +100MB). + - Larger containers receive proportional scaling (via `oomBumpUpRatio`, e.g., ×1.5). + + Implementation logic: + + ```golang + memoryNeeded := ResourceAmountMax( + memoryUsed + MemoryAmountFromBytes(GetAggregationsConfig().OOMMinBumpUp), + ScaleResource(memoryUsed, GetAggregationsConfig().OOMBumpUpRatio) + ) + ``` + Example: with oomBumpUpRatio: "1.5" and oomMinBumpUp: 104857600 (100MB): + - For a container using 50MB: max(50MB + 100MB, 50MB * 1.5) = 150MB + - For a container using 1GB: max(1GB + 100MB, 1GB * 1.5) = 1.5GB + + + - `oomBumpUpRatio` (Quantity): + - Multiplier applied to memory recommendations after OOM events + - Represented as a Quantity (e.g., "1.5") + - Must be greater than or equal to 1 + - Setting to 1 effectively disables the OOM ratio-based increase + - Controls how aggressively memory is increased after container crashes + + - `oomMinBumpUp` (bytes): + - Minimum absolute memory increase after OOM events + - Setting to 0 effectively disables the OOM minimum increase + * `memoryAggregationInterval` (duration): - Time window for aggregating memory usage data - Affects how quickly VPA responds to memory usage changes