Skip to content

Commit e39b45f

Browse files
Merge pull request #459 from tiraboschi/CPUPSIPressureUtilization
OCPBUGS-52341: Add CPUPSIPressureUtilization ActualUtilizationProfile
2 parents 3443eca + 7e4697c commit e39b45f

File tree

3 files changed

+8
-3
lines changed

3 files changed

+8
-3
lines changed

README.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -181,9 +181,10 @@ the `profileCustomizations` field:
181181
## Prometheus query profiles
182182
The operator provides the following profiles:
183183
- `PrometheusCPUUsage`: `instance:node_cpu:rate:sum` (metric available in OpenShift by default)
184-
- `PrometheusCPUPSIPressure`: `rate(node_pressure_cpu_waiting_seconds_total[1m])` (`node_pressure_cpu_waiting_seconds_total` is a custom metric that needs to be provided)
185-
- `PrometheusMemoryPSIPressure`: `rate(node_pressure_memory_waiting_seconds_total[1m])` (`node_pressure_memory_waiting_seconds_total` is a custom metric that needs to be provided)
186-
- `PrometheusIOPSIPressure`: `rate(node_pressure_io_waiting_seconds_total[1m])` (`node_pressure_memory_waiting_seconds_total` is a custom metric that needs to be provided)
184+
- `PrometheusCPUPSIPressure`: `rate(node_pressure_cpu_waiting_seconds_total[1m])` (`node_pressure_cpu_waiting_seconds_total` is reported in OpenShift only for nodes configured with psi=1 kernel argument)
185+
- `PrometheusCPUPSIPressureByUtilization`: `avg by (instance) ( rate(node_pressure_cpu_waiting_seconds_total[1m])) and (1 - avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[1m]))) > 0.7 or avg by (instance) ( rate(node_pressure_cpu_waiting_seconds_total[1m])) * 0` (`node_pressure_cpu_waiting_seconds_total` is reported in OpenShift only for nodes configured with psi=1 kernel argument; the query is filtering out PSI pressure on nodes with average CPU utilization < 0.7 to filter out false positives pressure spikes due to self imposed CPU throttling)
186+
- `PrometheusMemoryPSIPressure`: `rate(node_pressure_memory_waiting_seconds_total[1m])` (`node_pressure_memory_waiting_seconds_total` is reported in OpenShift only for nodes configured with psi=1 kernel argument)
187+
- `PrometheusIOPSIPressure`: `rate(node_pressure_io_waiting_seconds_total[1m])` (`node_pressure_memory_waiting_seconds_total` is reported in OpenShift only for nodes configured with psi=1 kernel argument)
187188

188189
```yaml
189190
apiVersion: operator.openshift.io/v1

pkg/apis/descheduler/v1/types_descheduler.go

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -129,6 +129,8 @@ const (
129129
PrometheusCPUUsageProfile ActualUtilizationProfile = "PrometheusCPUUsage"
130130
// PrometheusCPUPSIPressureProfile sets rate(node_pressure_cpu_waiting_seconds_total[1m]) query
131131
PrometheusCPUPSIPressureProfile ActualUtilizationProfile = "PrometheusCPUPSIPressure"
132+
// PrometheusCPUPSIPressureUtilizationProfile sets a query based on a combination of PSI CPU pressure and average CPU utilization
133+
PrometheusCPUPSIPressureByUtilizationProfile ActualUtilizationProfile = "PrometheusCPUPSIPressureByUtilization"
132134
// PrometheusMemoryPSIPressureProfile sets rate(node_pressure_memory_waiting_seconds_total[1m]) query
133135
PrometheusMemoryPSIPressureProfile ActualUtilizationProfile = "PrometheusMemoryPSIPressure"
134136
// PrometheusIOPSIPressureProfile sets rate(node_pressure_io_waiting_seconds_total[1m]) query

pkg/operator/target_config_reconciler.go

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -672,6 +672,8 @@ func lifecycleAndUtilizationProfile(profileCustomizations *deschedulerv1.Profile
672672
query = "instance:node_cpu:rate:sum"
673673
case deschedulerv1.PrometheusCPUPSIPressureProfile:
674674
query = "rate(node_pressure_cpu_waiting_seconds_total[1m])"
675+
case deschedulerv1.PrometheusCPUPSIPressureByUtilizationProfile:
676+
query = "avg by (instance) ( rate(node_pressure_cpu_waiting_seconds_total[1m])) and (1 - avg by (instance) (rate(node_cpu_seconds_total{mode=\"idle\"}[1m]))) > 0.7 or avg by (instance) ( rate(node_pressure_cpu_waiting_seconds_total[1m])) * 0"
675677
case deschedulerv1.PrometheusMemoryPSIPressureProfile:
676678
query = "rate(node_pressure_memory_waiting_seconds_total[1m])"
677679
case deschedulerv1.PrometheusIOPSIPressureProfile:

0 commit comments

Comments
 (0)