Skip to content

Commit c8b3561

Browse files
committed
Dockerfile and catalog
1 parent 23d57e1 commit c8b3561

File tree

6 files changed

+1048
-246
lines changed

6 files changed

+1048
-246
lines changed

.gitignore

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,3 @@
1-
cluster-kube-descheduler-operator
2-
31
# Binaries for programs and plugins
42
*.exe
53
*.exe~

README.md

Lines changed: 10 additions & 244 deletions
Original file line numberDiff line numberDiff line change
@@ -1,252 +1,18 @@
1-
# Kube Descheduler Operator
1+
# README
22

3-
Run the descheduler in your OpenShift cluster to move pods based on specific strategies.
3+
## FBC catalog rendering
44

5-
## Releases
5+
To initiliaze catalog-template.json
66

7-
| kdo version | ocp version | k8s version | golang |
8-
| ----------- | ----------- | ----------- | ------ |
9-
| 5.0.0 | 4.15, 4.16 | 1.28 | 1.20 |
10-
| 5.0.1 | 4.15, 4.16 | 1.29 | 1.21 |
11-
| 5.0.2 | 4.15, 4.16 | 1.29 | 1.21 |
12-
| 5.1.0 | 4.17, 4.18 | 1.30 | 1.22 |
13-
| 5.1.1 | 4.17, 4.18 | 1.31 | 1.22 |
14-
15-
## Deploy the operator
16-
17-
### Quick Development
18-
19-
1. Build and push the operator image to a registry:
20-
2. Ensure the `image` spec in `deploy/05_deployment.yaml` refers to the operator image you pushed
21-
3. Run `oc create -f deploy/.`
22-
23-
### OperatorHub install with custom index image
24-
25-
This process refers to building the operator in a way that it can be installed locally via the OperatorHub with a custom index image
26-
27-
1. Build and push the operator image to a registry:
28-
```sh
29-
export QUAY_USER=${your_quay_user_id}
30-
export IMAGE_TAG=${your_image_tag}
31-
podman build -t quay.io/${QUAY_USER}/cluster-kube-descheduler-operator:${IMAGE_TAG} -f Dockerfile.rhel7
32-
podman login quay.io -u ${QUAY_USER}
33-
podman push quay.io/${QUAY_USER}/cluster-kube-descheduler-operator:${IMAGE_TAG}
34-
```
35-
36-
1. Update the `.spec.install.spec.deployments[0].spec.template.spec.containers[0].image` field in the KDO CSV under `./manifests/cluster-kube-descheduler-operator.clusterserviceversion.yaml` to point to the newly built image.
37-
38-
1. build and push the metadata image to a registry (e.g. https://quay.io):
39-
```sh
40-
podman build -t quay.io/${QUAY_USER}/cluster-kube-descheduler-operator-metadata:${IMAGE_TAG} -f Dockerfile.metadata .
41-
podman push quay.io/${QUAY_USER}/cluster-kube-descheduler-operator-metadata:${IMAGE_TAG}
42-
```
43-
44-
1. build and push image index for operator-registry (pull and build https://github.com/operator-framework/operator-registry/ to get the `opm` binary)
45-
```sh
46-
opm index add --bundles quay.io/${QUAY_USER}/cluster-kube-descheduler-operator-metadata:${IMAGE_TAG} --tag quay.io/${QUAY_USER}/cluster-kube-descheduler-operator-index:${IMAGE_TAG}
47-
podman push quay.io/${QUAY_USER}/cluster-kube-descheduler-operator-index:${IMAGE_TAG}
48-
```
49-
50-
Don't forget to increase the number of open files, .e.g. `ulimit -n 100000` in case the current limit is insufficient.
51-
52-
1. create and apply catalogsource manifest (remember to change <<QUAY_USER>> and <<IMAGE_TAG>> to your own values):
53-
```yaml
54-
apiVersion: operators.coreos.com/v1alpha1
55-
kind: CatalogSource
56-
metadata:
57-
name: cluster-kube-descheduler-operator
58-
namespace: openshift-marketplace
59-
spec:
60-
sourceType: grpc
61-
image: quay.io/<<QUAY_USER>>/cluster-kube-descheduler-operator-index:<<IMAGE_TAG>>
62-
```
63-
64-
1. create `openshift-kube-descheduler-operator` namespace:
65-
```
66-
$ oc create ns openshift-kube-descheduler-operator
67-
```
68-
69-
1. open the console Operators -> OperatorHub, search for `descheduler operator` and install the operator
70-
71-
72-
## Sample CR
73-
74-
A sample CR definition looks like below (the operator expects `cluster` CR under `openshift-kube-descheduler-operator` namespace):
75-
76-
```yaml
77-
apiVersion: operator.openshift.io/v1
78-
kind: KubeDescheduler
79-
metadata:
80-
name: cluster
81-
namespace: openshift-kube-descheduler-operator
82-
spec:
83-
deschedulingIntervalSeconds: 1800
84-
profiles:
85-
- AffinityAndTaints
86-
- LifecycleAndUtilization
87-
profileCustomizations:
88-
podLifetime: 5m
89-
namespaces:
90-
included:
91-
- ns1
92-
- ns2
7+
```sh
8+
$ opm migrate registry.redhat.io/redhat/redhat-operator-index:v4.17 ./catalog-migrate
9+
$ mkdir -p v4.18/catalog/cluster-kube-descheduler-operator
10+
$ opm alpha convert-template basic ./catalog-migrate/cluster-kube-descheduler-operator/catalog.json > v4.18/catalog-template.json
9311
```
9412

95-
The operator spec provides a `profiles` field, which allows users to set one or more descheduling profiles to enable.
96-
97-
These profiles map to preconfigured policy definitions, enabling several descheduler strategies grouped by intent, and
98-
any that are enabled will be merged.
99-
100-
## Profiles
101-
102-
The following profiles are currently provided:
103-
* [`AffinityAndTaints`](#AffinityAndTaints)
104-
* [`TopologyAndDuplicates`](#TopologyAndDuplicates)
105-
* [`SoftTopologyAndDuplicates`](#SoftTopologyAndDuplicates)
106-
* [`LifecycleAndUtilization`](#LifecycleAndUtilization)
107-
* [`LongLifecycle`](#LongLifecycle)
108-
* [`CompactAndScale`](#compactandscale-techpreview)
109-
* [`EvictPodsWithPVC`](#EvictPodsWithPVC)
110-
* [`EvictPodsWithLocalStorage`](#EvictPodsWithLocalStorage)
111-
112-
Each of these enables cluster-wide descheduling (excluding openshift and kube-system namespaces) based on certain goals.
113-
114-
### AffinityAndTaints
115-
This is the most basic descheduling profile and it removes running pods which violate node and pod affinity, and node
116-
taints.
117-
118-
This profile enables the [`RemovePodsViolatingInterPodAntiAffinity`](https://github.com/kubernetes-sigs/descheduler/#removepodsviolatinginterpodantiaffinity),
119-
[`RemovePodsViolatingNodeAffinity`](https://github.com/kubernetes-sigs/descheduler/#removepodsviolatingnodeaffinity), and
120-
[`RemovePodsViolatingNodeTaints`](https://github.com/kubernetes-sigs/descheduler/#removepodsviolatingnodeaffinity) strategies.
13+
To update the catalog
12114

122-
### TopologyAndDuplicates
123-
This profile attempts to balance pod distribution based on topology constraint definitions and evicting duplicate copies
124-
of the same pod running on the same node. It enables the [`RemovePodsViolatingTopologySpreadConstraints`](https://github.com/kubernetes-sigs/descheduler/#removepodsviolatingtopologyspreadconstraint)
125-
and [`RemoveDuplicates`](https://github.com/kubernetes-sigs/descheduler/#removeduplicates) strategies.
126-
127-
### SoftTopologyAndDuplicates
128-
This profile is the same as `TopologyAndDuplicates`, however it will also consider pods with "soft" topology constraints
129-
for eviction (ie, `whenUnsatisfiable: ScheduleAnyway`)
130-
131-
### LifecycleAndUtilization
132-
This profile focuses on pod lifecycles and node resource consumption. It will evict any running pod older than 24 hours
133-
and attempts to evict pods from "high utilization" nodes that can fit onto "low utilization" nodes. A high utilization
134-
node is any node consuming more than 50% its available cpu, memory, *or* pod capacity. A low utilization node is any node
135-
with less than 20% of its available cpu, memory, *and* pod capacity.
136-
137-
This profile enables the [`LowNodeUtilizaition`](https://github.com/kubernetes-sigs/descheduler/#lownodeutilization),
138-
[`RemovePodsHavingTooManyRestarts`](https://github.com/kubernetes-sigs/descheduler/#removepodshavingtoomanyrestarts) and
139-
[`PodLifeTime`](https://github.com/kubernetes-sigs/descheduler/#podlifetime) strategies. In the future, more configuration
140-
may be made available through the operator for these strategies based on user feedback.
141-
142-
### LongLifecycle
143-
This profile provides cluster resource balancing similar to [LifecycleAndUtilization](#LifecycleAndUtilization) for longer-running
144-
clusters. It does not evict pods based on the 24 hour lifetime used by LifecycleAndUtilization.
145-
146-
### CompactAndScale
147-
This profile seeks to evict pods to enable the same workload to run on a smaller set of nodes.
148-
It will attempts to evict pods from "under utilized" nodes that can fit into fewer nodes.
149-
An under utilized node is any node consuming less than 20% of its available cpu, memory, *and* pod capacity.
150-
151-
This profile enables the [`HighNodeUtilization`](https://github.com/kubernetes-sigs/descheduler/#highnodeutilization) strategy.
152-
In the future, more configuration may be made available through the operator based on user feedback.
153-
154-
### EvictPodsWithPVC
155-
By default, the operator prevents pods with PVCs from being evicted. Enabling this
156-
profile in combination with any of the above profiles allows pods with PVCs to be
157-
eligible for eviction.
158-
159-
### EvictPodsWithLocalStorage
160-
By default, pods with local storage are not eligible to be considered for eviction by any
161-
profile. Using this profile allows them to be evicted if necessary. A pod is defined as using
162-
local storage if any of its volumes have `HostPath` or `EmptyDir` set (note that a pod that only
163-
uses PVCs does not fit this definition, and will need the `EvictPodsWithPVC` profile instead. Pods
164-
that use both will need both profiles to be evicted).
165-
166-
## Profile Customizations
167-
Some profiles expose options which may be used to configure the underlying Descheduler strategy parameters. These are available under
168-
the `profileCustomizations` field:
169-
170-
|Name|Type|Description|
171-
|---|---|---|
172-
|`podLifetime`|`time.Duration`|Sets the lifetime value for pods evicted by the `LifecycleAndUtilization` profile|
173-
|`thresholdPriorityClassName`|`string`|Sets the priority class threshold by name for all strategies|
174-
|`thresholdPriority`|`string`|Sets the priority class threshold by value for all strategies|
175-
|`namespaces.included`, `namespaces.excluded`|`[]string`| Sets the included/excluded namespaces for all strategies (included namespaces are not allowed to include protected namespaces which consist of `kube-system`, `hypershift` and all `openshift-` prefixed namespaces)|
176-
| `devLowNodeUtilizationThresholds` | `string` | Sets experimental thresholds for the [LowNodeUtilization](https://github.com/kubernetes-sigs/descheduler#lownodeutilization) strategy of the `LifecycleAndUtilization` profile in the following ratios: `Low` for 10%:30%, `Medium` for 20%:50%, `High` for 40%:70%|
177-
|`devEnableEvictionsInBackground`|`bool`| Enables descheduler's EvictionsInBackground alpha feature. The EvictionsInBackground alpha feature is a subject to change. Currently provided as an experimental feature.|
178-
| `devHighNodeUtilizationThresholds` | `string` | Sets thresholds for the [HighNodeUtilization](https://github.com/kubernetes-sigs/descheduler#highnodeutilization) strategy of the `CompactAndScale` profile in the following ratios: `Minimal` for 10%, `Modest` for 20%, `Moderate` for 30%. Currently provided as an experimental feature.|
179-
|`devActualUtilizationProfile`|`string`| Sets a profile that gets translated into a predefined prometheus query |
180-
181-
## Prometheus query profiles
182-
The operator provides the following profiles:
183-
- `PrometheusCPUUsage`: `instance:node_cpu:rate:sum` (metric available in OpenShift by default)
184-
- `PrometheusCPUPSIPressure`: `rate(node_pressure_cpu_waiting_seconds_total[1m])` (`node_pressure_cpu_waiting_seconds_total` is a custom metric that needs to be provided)
185-
- `PrometheusMemoryPSIPressure`: `rate(node_pressure_memory_waiting_seconds_total[1m])` (`node_pressure_memory_waiting_seconds_total` is a custom metric that needs to be provided)
186-
- `PrometheusIOPSIPressure`: `rate(node_pressure_io_waiting_seconds_total[1m])` (`node_pressure_memory_waiting_seconds_total` is a custom metric that needs to be provided)
187-
188-
```yaml
189-
apiVersion: operator.openshift.io/v1
190-
kind: KubeDescheduler
191-
metadata:
192-
name: cluster
193-
namespace: openshift-kube-descheduler-operator
194-
spec:
195-
managementState: Managed
196-
deschedulingIntervalSeconds: 3600
197-
profiles:
198-
- LongLifecycle
199-
profileCustomizations:
200-
devActualUtilizationProfile: PrometheusCPUUsage
20115
```
202-
203-
## Descheduling modes
204-
The operator provides two modes of eviction:
205-
- `Predictive`: configures the descheduler to only simulate eviction
206-
- `Automatic`: configures the descheduler to evict pods
207-
208-
The predictive mode is the default mode.
209-
The descheduler in either of the modes still produces metrics (unless the metrics are disabled).
210-
When the predictive mode is configured, the reported metrics can serve as an estimation
211-
of evicted pods in the cluster.
212-
213-
214-
## How does the descheduler operator work?
215-
216-
Descheduler operator at a high level is responsible for watching the above CR
217-
- Create a configmap that could be used by descheduler.
218-
- Run descheduler as a deployment mounting the configmap as a policy file in the pod.
219-
220-
The configmap created from above sample CR definition looks like this:
221-
222-
```yaml
223-
apiVersion: descheduler/v1alpha1
224-
kind: DeschedulerPolicy
225-
strategies:
226-
RemovePodsViolatingInterPodAntiAffinity:
227-
enabled: true
228-
...
229-
RemovePodsViolatingNodeAffinity:
230-
enabled: true
231-
params:
232-
...
233-
nodeAffinityType:
234-
- requiredDuringSchedulingIgnoredDuringExecution
235-
RemovePodsViolatingNodeTaints:
236-
enabled: true
237-
...
16+
$ cd v4.18
17+
$ opm alpha render-template basic catalog-template.json --migrate-level bundle-object-to-csv-metadata > catalog/cluster-kube-descheduler-operator/catalog.json
23818
```
239-
(Some generated parameters omitted.)
240-
241-
242-
## Parameters
243-
The Descheduler operator exposes the following parameters in its CRD:
244-
245-
|Name|Type|Description|
246-
|---|---|---|
247-
|`deschedulingIntervalSeconds`|`int32`|Sets the number of seconds between descheduler runs|
248-
|`profiles`|`[]string`|Sets which descheduler strategy profiles are enabled|
249-
|`profileCustomizations`|`map`|Contains various parameters for modifying the default behavior of certain profiles|
250-
|`mode`|`string`|Configures the descheduler to either evict pods or to simulate the eviction|
251-
|`evictionLimits`|`map`|Restrict the number of evictions during each descheduling run. Available fields are: `total`|
252-
|`evictionLimits.total`|`int32`|Restricts the maximum number of overall evictions|

0 commit comments

Comments
 (0)