|
| 1 | +--- |
| 2 | +title: Contraintes de propagation de topologie pour les Pods |
| 3 | +content_type: concept |
| 4 | +weight: 40 |
| 5 | +--- |
| 6 | + |
| 7 | +<!-- overview --> |
| 8 | + |
| 9 | +{{< feature-state for_k8s_version="v1.18" state="beta" >}} |
| 10 | + |
| 11 | +Vous pouvez utiliser des _contraintes de propagation de topologie_ pour contrôler comment les {{< glossary_tooltip text="Pods" term_id="Pod" >}} sont propagés à travers votre cluster parmi les domaines de défaillance comme les régions, zones, noeuds et autres domaines de topologie définis par l'utilisateur. Ceci peut aider à créer de la haute disponibilité et à utiliser efficacement les ressources. |
| 12 | + |
| 13 | + |
| 14 | + |
| 15 | +<!-- body --> |
| 16 | + |
| 17 | +## Conditions préalables |
| 18 | + |
| 19 | +### Autoriser la Feature Gate |
| 20 | + |
| 21 | +La [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) `EvenPodsSpread` doit être autorisée pour |
| 22 | +{{< glossary_tooltip text="l'API Server" term_id="kube-apiserver" >}} **et** le |
| 23 | +{{< glossary_tooltip text="scheduler" term_id="kube-scheduler" >}}. |
| 24 | + |
| 25 | +### Labels de noeuds |
| 26 | + |
| 27 | +Les contraintes de propagation de topologie reposent sur les labels de noeuds pour identifier le ou les domaines de topologie dans lesquels se trouve chacun des noeuds. Par exemple, un noeud pourrait avoir les labels: `node=node1,zone=us-east-1a,region=us-east-1` |
| 28 | + |
| 29 | +Supposez que vous avec un cluster de 4 noeuds ayant les labels suivants: |
| 30 | + |
| 31 | +``` |
| 32 | +NAME STATUS ROLES AGE VERSION LABELS |
| 33 | +node1 Ready <none> 4m26s v1.16.0 node=node1,zone=zoneA |
| 34 | +node2 Ready <none> 3m58s v1.16.0 node=node2,zone=zoneA |
| 35 | +node3 Ready <none> 3m17s v1.16.0 node=node3,zone=zoneB |
| 36 | +node4 Ready <none> 2m43s v1.16.0 node=node4,zone=zoneB |
| 37 | +``` |
| 38 | + |
| 39 | +Une vue logique du cluster est celle-ci : |
| 40 | + |
| 41 | +``` |
| 42 | ++---------------+---------------+ |
| 43 | +| zoneA | zoneB | |
| 44 | ++-------+-------+-------+-------+ |
| 45 | +| node1 | node2 | node3 | node4 | |
| 46 | ++-------+-------+-------+-------+ |
| 47 | +``` |
| 48 | + |
| 49 | +Plutôt que d'appliquer des labels manuellement, vous pouvez aussi réutiliser les [labels réputés](/docs/reference/kubernetes-api/labels-annotations-taints/) qui sont créés et renseignés automatiquement dans la plupart des clusters. |
| 50 | + |
| 51 | +## Contraintes de propagation pour les Pods |
| 52 | + |
| 53 | +### API |
| 54 | + |
| 55 | +Le champ `pod.spec.topologySpreadConstraints` est introduit dans 1.16 comme suit : |
| 56 | + |
| 57 | +``` |
| 58 | +apiVersion: v1 |
| 59 | +kind: Pod |
| 60 | +metadata: |
| 61 | + name: mypod |
| 62 | +spec: |
| 63 | + topologySpreadConstraints: |
| 64 | + - maxSkew: <integer> |
| 65 | + topologyKey: <string> |
| 66 | + whenUnsatisfiable: <string> |
| 67 | + labelSelector: <object> |
| 68 | +``` |
| 69 | + |
| 70 | +Vous pouvez définir une ou plusieurs `topologySpreadConstraint` pour indiquer au kube-scheduler comment placer chaque nouveau Pod par rapport aux Pods déjà existants dans votre cluster. Les champs sont : |
| 71 | + |
| 72 | +- **maxSkew** décrit le degré avec lequel les Pods peuvent être inégalement distribués. C'est la différence maximale permise entre le nombre de Pods correspondants entre deux quelconques domaines de topologie d'un type donné. Il doit être supérieur à zéro. |
| 73 | +- **topologyKey** est la clé des labels de noeuds. Si deux noeuds sont étiquettés avec cette clé et ont des valeurs égales pour ce label, le scheduler considère les deux noeuds dans la même topologie. Le scheduler essaie de placer un nombre équilibré de Pods dans chaque domaine de topologie. |
| 74 | +- **whenUnsatisfiable** indique comment traiter un Pod qui ne satisfait pas les contraintes de propagation : |
| 75 | + - `DoNotSchedule` (défaut) indique au scheduler de ne pas le programmer. |
| 76 | + - `ScheduleAnyway` indique au scheduler de le programmer, tout en priorisant les noeuds minimisant le biais (*skew*). |
| 77 | +- **labelSelector** est utilisé pour touver les Pods correspondants. Les Pods correspondants à ce sélecteur de labels sont comptés pour déterminer le nombre de Pods dans leurs domaines de topologie correspodants. Voir [Sélecteurs de labels](/docs/concepts/overview/working-with-objects/labels/#label-selectors) pour plus de détails. |
| 78 | + |
| 79 | +Vous pouvez en savoir plus sur ce champ en exécutant `kubectl explain Pod.spec.topologySpreadConstraints`. |
| 80 | + |
| 81 | +### Exemple : Une TopologySpreadConstraint |
| 82 | + |
| 83 | +Supposez que vous avez un cluster de 4 noeuds où 3 Pods étiquettés `foo:bar` sont placés sur node1, node2 et node3 respectivement (`P` représente un Pod) : |
| 84 | + |
| 85 | +``` |
| 86 | ++---------------+---------------+ |
| 87 | +| zoneA | zoneB | |
| 88 | ++-------+-------+-------+-------+ |
| 89 | +| node1 | node2 | node3 | node4 | |
| 90 | ++-------+-------+-------+-------+ |
| 91 | +| P | P | P | | |
| 92 | ++-------+-------+-------+-------+ |
| 93 | +``` |
| 94 | + |
| 95 | +Si nous voulons qu'un nouveau Pod soit uniformément réparti avec les Pods existants à travers les zones, la spec peut être : |
| 96 | + |
| 97 | +{{< codenew file="pods/topology-spread-constraints/one-constraint.yaml" >}} |
| 98 | + |
| 99 | +`topologyKey: zone` implies the even distribution will only be applied to the nodes which have label pair "zone:<any value>" present. `whenUnsatisfiable: DoNotSchedule` tells the scheduler to let it stay pending if the incoming Pod can’t satisfy the constraint. |
| 100 | + |
| 101 | +If the scheduler placed this incoming Pod into "zoneA", the Pods distribution would become [3, 1], hence the actual skew is 2 (3 - 1) - which violates `maxSkew: 1`. In this example, the incoming Pod can only be placed onto "zoneB": |
| 102 | + |
| 103 | +``` |
| 104 | ++---------------+---------------+ +---------------+---------------+ |
| 105 | +| zoneA | zoneB | | zoneA | zoneB | |
| 106 | ++-------+-------+-------+-------+ +-------+-------+-------+-------+ |
| 107 | +| node1 | node2 | node3 | node4 | OR | node1 | node2 | node3 | node4 | |
| 108 | ++-------+-------+-------+-------+ +-------+-------+-------+-------+ |
| 109 | +| P | P | P | P | | P | P | P P | | |
| 110 | ++-------+-------+-------+-------+ +-------+-------+-------+-------+ |
| 111 | +``` |
| 112 | + |
| 113 | +You can tweak the Pod spec to meet various kinds of requirements: |
| 114 | + |
| 115 | +- Change `maxSkew` to a bigger value like "2" so that the incoming Pod can be placed onto "zoneA" as well. |
| 116 | +- Change `topologyKey` to "node" so as to distribute the Pods evenly across nodes instead of zones. In the above example, if `maxSkew` remains "1", the incoming Pod can only be placed onto "node4". |
| 117 | +- Change `whenUnsatisfiable: DoNotSchedule` to `whenUnsatisfiable: ScheduleAnyway` to ensure the incoming Pod to be always schedulable (suppose other scheduling APIs are satisfied). However, it’s preferred to be placed onto the topology domain which has fewer matching Pods. (Be aware that this preferability is jointly normalized with other internal scheduling priorities like resource usage ratio, etc.) |
| 118 | + |
| 119 | +### Example: Multiple TopologySpreadConstraints |
| 120 | + |
| 121 | +This builds upon the previous example. Suppose you have a 4-node cluster where 3 Pods labeled `foo:bar` are located in node1, node2 and node3 respectively (`P` represents Pod): |
| 122 | + |
| 123 | +``` |
| 124 | ++---------------+---------------+ |
| 125 | +| zoneA | zoneB | |
| 126 | ++-------+-------+-------+-------+ |
| 127 | +| node1 | node2 | node3 | node4 | |
| 128 | ++-------+-------+-------+-------+ |
| 129 | +| P | P | P | | |
| 130 | ++-------+-------+-------+-------+ |
| 131 | +``` |
| 132 | + |
| 133 | +You can use 2 TopologySpreadConstraints to control the Pods spreading on both zone and node: |
| 134 | + |
| 135 | +{{< codenew file="pods/topology-spread-constraints/two-constraints.yaml" >}} |
| 136 | + |
| 137 | +In this case, to match the first constraint, the incoming Pod can only be placed onto "zoneB"; while in terms of the second constraint, the incoming Pod can only be placed onto "node4". Then the results of 2 constraints are ANDed, so the only viable option is to place on "node4". |
| 138 | + |
| 139 | +Multiple constraints can lead to conflicts. Suppose you have a 3-node cluster across 2 zones: |
| 140 | + |
| 141 | +``` |
| 142 | ++---------------+-------+ |
| 143 | +| zoneA | zoneB | |
| 144 | ++-------+-------+-------+ |
| 145 | +| node1 | node2 | node3 | |
| 146 | ++-------+-------+-------+ |
| 147 | +| P P | P | P P | |
| 148 | ++-------+-------+-------+ |
| 149 | +``` |
| 150 | + |
| 151 | +If you apply "two-constraints.yaml" to this cluster, you will notice "mypod" stays in `Pending` state. This is because: to satisfy the first constraint, "mypod" can only be put to "zoneB"; while in terms of the second constraint, "mypod" can only put to "node2". Then a joint result of "zoneB" and "node2" returns nothing. |
| 152 | + |
| 153 | +To overcome this situation, you can either increase the `maxSkew` or modify one of the constraints to use `whenUnsatisfiable: ScheduleAnyway`. |
| 154 | + |
| 155 | +### Conventions |
| 156 | + |
| 157 | +There are some implicit conventions worth noting here: |
| 158 | + |
| 159 | +- Only the Pods holding the same namespace as the incoming Pod can be matching candidates. |
| 160 | + |
| 161 | +- Nodes without `topologySpreadConstraints[*].topologyKey` present will be bypassed. It implies that: |
| 162 | + |
| 163 | + 1. the Pods located on those nodes do not impact `maxSkew` calculation - in the above example, suppose "node1" does not have label "zone", then the 2 Pods will be disregarded, hence the incomingPod will be scheduled into "zoneA". |
| 164 | + 2. the incoming Pod has no chances to be scheduled onto this kind of nodes - in the above example, suppose a "node5" carrying label `{zone-typo: zoneC}` joins the cluster, it will be bypassed due to the absence of label key "zone". |
| 165 | + |
| 166 | +- Be aware of what will happen if the incomingPod’s `topologySpreadConstraints[*].labelSelector` doesn’t match its own labels. In the above example, if we remove the incoming Pod’s labels, it can still be placed onto "zoneB" since the constraints are still satisfied. However, after the placement, the degree of imbalance of the cluster remains unchanged - it’s still zoneA having 2 Pods which hold label {foo:bar}, and zoneB having 1 Pod which holds label {foo:bar}. So if this is not what you expect, we recommend the workload’s `topologySpreadConstraints[*].labelSelector` to match its own labels. |
| 167 | + |
| 168 | +- If the incoming Pod has `spec.nodeSelector` or `spec.affinity.nodeAffinity` defined, nodes not matching them will be bypassed. |
| 169 | + |
| 170 | + Suppose you have a 5-node cluster ranging from zoneA to zoneC: |
| 171 | + |
| 172 | + ``` |
| 173 | + +---------------+---------------+-------+ |
| 174 | + | zoneA | zoneB | zoneC | |
| 175 | + +-------+-------+-------+-------+-------+ |
| 176 | + | node1 | node2 | node3 | node4 | node5 | |
| 177 | + +-------+-------+-------+-------+-------+ |
| 178 | + | P | P | P | | | |
| 179 | + +-------+-------+-------+-------+-------+ |
| 180 | + ``` |
| 181 | +
|
| 182 | + and you know that "zoneC" must be excluded. In this case, you can compose the yaml as below, so that "mypod" will be placed onto "zoneB" instead of "zoneC". Similarly `spec.nodeSelector` is also respected. |
| 183 | +
|
| 184 | + {{< codenew file="pods/topology-spread-constraints/one-constraint-with-nodeaffinity.yaml" >}} |
| 185 | +
|
| 186 | +### Cluster-level default constraints |
| 187 | +
|
| 188 | +{{< feature-state for_k8s_version="v1.18" state="alpha" >}} |
| 189 | +
|
| 190 | +It is possible to set default topology spread constraints for a cluster. Default |
| 191 | +topology spread constraints are applied to a Pod if, and only if: |
| 192 | +
|
| 193 | +- It doesn't define any constraints in its `.spec.topologySpreadConstraints`. |
| 194 | +- It belongs to a service, replication controller, replica set or stateful set. |
| 195 | +
|
| 196 | +Default constraints can be set as part of the `PodTopologySpread` plugin args |
| 197 | +in a [scheduling profile](/docs/reference/scheduling/profiles). |
| 198 | +The constraints are specified with the same [API above](#api), except that |
| 199 | +`labelSelector` must be empty. The selectors are calculated from the services, |
| 200 | +replication controllers, replica sets or stateful sets that the Pod belongs to. |
| 201 | +
|
| 202 | +An example configuration might look like follows: |
| 203 | +
|
| 204 | +```yaml |
| 205 | +apiVersion: kubescheduler.config.k8s.io/v1alpha2 |
| 206 | +kind: KubeSchedulerConfiguration |
| 207 | +
|
| 208 | +profiles: |
| 209 | + pluginConfig: |
| 210 | + - name: PodTopologySpread |
| 211 | + args: |
| 212 | + defaultConstraints: |
| 213 | + - maxSkew: 1 |
| 214 | + topologyKey: failure-domain.beta.kubernetes.io/zone |
| 215 | + whenUnsatisfiable: ScheduleAnyway |
| 216 | +``` |
| 217 | + |
| 218 | +{{< note >}} |
| 219 | +The score produced by default scheduling constraints might conflict with the |
| 220 | +score produced by the |
| 221 | +[`DefaultPodTopologySpread` plugin](/docs/reference/scheduling/profiles/#scheduling-plugins). |
| 222 | +It is recommended that you disable this plugin in the scheduling profile when |
| 223 | +using default constraints for `PodTopologySpread`. |
| 224 | +{{< /note >}} |
| 225 | + |
| 226 | +## Comparison with PodAffinity/PodAntiAffinity |
| 227 | + |
| 228 | +In Kubernetes, directives related to "Affinity" control how Pods are |
| 229 | +scheduled - more packed or more scattered. |
| 230 | + |
| 231 | +- For `PodAffinity`, you can try to pack any number of Pods into qualifying |
| 232 | + topology domain(s) |
| 233 | +- For `PodAntiAffinity`, only one Pod can be scheduled into a |
| 234 | + single topology domain. |
| 235 | + |
| 236 | +The "EvenPodsSpread" feature provides flexible options to distribute Pods evenly across different |
| 237 | +topology domains - to achieve high availability or cost-saving. This can also help on rolling update |
| 238 | +workloads and scaling out replicas smoothly. See [Motivation](https://github.com/kubernetes/enhancements/tree/master/keps/sig-scheduling/895-pod-topology-spread#motivation) for more details. |
| 239 | + |
| 240 | +## Known Limitations |
| 241 | + |
| 242 | +As of 1.18, at which this feature is Beta, there are some known limitations: |
| 243 | + |
| 244 | +- Scaling down a Deployment may result in imbalanced Pods distribution. |
| 245 | +- Pods matched on tainted nodes are respected. See [Issue 80921](https://github.com/kubernetes/kubernetes/issues/80921) |
| 246 | + |
| 247 | + |
0 commit comments