Skip to content

Commit 1103044

Browse files
authored
cmd/k8s-operator,k8s-operator: add topology spread constraints to ProxyClass (tailscale#13959)
Now when we have HA for egress proxies, it makes sense to support topology spread constraints that would allow users to define more complex topologies of how proxy Pods need to be deployed in relation with other Pods/across regions etc. Updates tailscale#13406 Signed-off-by: Irbe Krumina <[email protected]>
1 parent 856ea23 commit 1103044

File tree

7 files changed

+378
-0
lines changed

7 files changed

+378
-0
lines changed

cmd/k8s-operator/deploy/crds/tailscale.com_proxyclasses.yaml

Lines changed: 176 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1896,6 +1896,182 @@ spec:
18961896
Value is the taint value the toleration matches to.
18971897
If the operator is Exists, the value should be empty, otherwise just a regular string.
18981898
type: string
1899+
topologySpreadConstraints:
1900+
description: |-
1901+
Proxy Pod's topology spread constraints.
1902+
By default Tailscale Kubernetes operator does not apply any topology spread constraints.
1903+
https://kubernetes.io/docs/concepts/scheduling-eviction/topology-spread-constraints/
1904+
type: array
1905+
items:
1906+
description: TopologySpreadConstraint specifies how to spread matching pods among the given topology.
1907+
type: object
1908+
required:
1909+
- maxSkew
1910+
- topologyKey
1911+
- whenUnsatisfiable
1912+
properties:
1913+
labelSelector:
1914+
description: |-
1915+
LabelSelector is used to find matching pods.
1916+
Pods that match this label selector are counted to determine the number of pods
1917+
in their corresponding topology domain.
1918+
type: object
1919+
properties:
1920+
matchExpressions:
1921+
description: matchExpressions is a list of label selector requirements. The requirements are ANDed.
1922+
type: array
1923+
items:
1924+
description: |-
1925+
A label selector requirement is a selector that contains values, a key, and an operator that
1926+
relates the key and values.
1927+
type: object
1928+
required:
1929+
- key
1930+
- operator
1931+
properties:
1932+
key:
1933+
description: key is the label key that the selector applies to.
1934+
type: string
1935+
operator:
1936+
description: |-
1937+
operator represents a key's relationship to a set of values.
1938+
Valid operators are In, NotIn, Exists and DoesNotExist.
1939+
type: string
1940+
values:
1941+
description: |-
1942+
values is an array of string values. If the operator is In or NotIn,
1943+
the values array must be non-empty. If the operator is Exists or DoesNotExist,
1944+
the values array must be empty. This array is replaced during a strategic
1945+
merge patch.
1946+
type: array
1947+
items:
1948+
type: string
1949+
x-kubernetes-list-type: atomic
1950+
x-kubernetes-list-type: atomic
1951+
matchLabels:
1952+
description: |-
1953+
matchLabels is a map of {key,value} pairs. A single {key,value} in the matchLabels
1954+
map is equivalent to an element of matchExpressions, whose key field is "key", the
1955+
operator is "In", and the values array contains only "value". The requirements are ANDed.
1956+
type: object
1957+
additionalProperties:
1958+
type: string
1959+
x-kubernetes-map-type: atomic
1960+
matchLabelKeys:
1961+
description: |-
1962+
MatchLabelKeys is a set of pod label keys to select the pods over which
1963+
spreading will be calculated. The keys are used to lookup values from the
1964+
incoming pod labels, those key-value labels are ANDed with labelSelector
1965+
to select the group of existing pods over which spreading will be calculated
1966+
for the incoming pod. The same key is forbidden to exist in both MatchLabelKeys and LabelSelector.
1967+
MatchLabelKeys cannot be set when LabelSelector isn't set.
1968+
Keys that don't exist in the incoming pod labels will
1969+
be ignored. A null or empty list means only match against labelSelector.
1970+
1971+
This is a beta field and requires the MatchLabelKeysInPodTopologySpread feature gate to be enabled (enabled by default).
1972+
type: array
1973+
items:
1974+
type: string
1975+
x-kubernetes-list-type: atomic
1976+
maxSkew:
1977+
description: |-
1978+
MaxSkew describes the degree to which pods may be unevenly distributed.
1979+
When `whenUnsatisfiable=DoNotSchedule`, it is the maximum permitted difference
1980+
between the number of matching pods in the target topology and the global minimum.
1981+
The global minimum is the minimum number of matching pods in an eligible domain
1982+
or zero if the number of eligible domains is less than MinDomains.
1983+
For example, in a 3-zone cluster, MaxSkew is set to 1, and pods with the same
1984+
labelSelector spread as 2/2/1:
1985+
In this case, the global minimum is 1.
1986+
| zone1 | zone2 | zone3 |
1987+
| P P | P P | P |
1988+
- if MaxSkew is 1, incoming pod can only be scheduled to zone3 to become 2/2/2;
1989+
scheduling it onto zone1(zone2) would make the ActualSkew(3-1) on zone1(zone2)
1990+
violate MaxSkew(1).
1991+
- if MaxSkew is 2, incoming pod can be scheduled onto any zone.
1992+
When `whenUnsatisfiable=ScheduleAnyway`, it is used to give higher precedence
1993+
to topologies that satisfy it.
1994+
It's a required field. Default value is 1 and 0 is not allowed.
1995+
type: integer
1996+
format: int32
1997+
minDomains:
1998+
description: |-
1999+
MinDomains indicates a minimum number of eligible domains.
2000+
When the number of eligible domains with matching topology keys is less than minDomains,
2001+
Pod Topology Spread treats "global minimum" as 0, and then the calculation of Skew is performed.
2002+
And when the number of eligible domains with matching topology keys equals or greater than minDomains,
2003+
this value has no effect on scheduling.
2004+
As a result, when the number of eligible domains is less than minDomains,
2005+
scheduler won't schedule more than maxSkew Pods to those domains.
2006+
If value is nil, the constraint behaves as if MinDomains is equal to 1.
2007+
Valid values are integers greater than 0.
2008+
When value is not nil, WhenUnsatisfiable must be DoNotSchedule.
2009+
2010+
For example, in a 3-zone cluster, MaxSkew is set to 2, MinDomains is set to 5 and pods with the same
2011+
labelSelector spread as 2/2/2:
2012+
| zone1 | zone2 | zone3 |
2013+
| P P | P P | P P |
2014+
The number of domains is less than 5(MinDomains), so "global minimum" is treated as 0.
2015+
In this situation, new pod with the same labelSelector cannot be scheduled,
2016+
because computed skew will be 3(3 - 0) if new Pod is scheduled to any of the three zones,
2017+
it will violate MaxSkew.
2018+
type: integer
2019+
format: int32
2020+
nodeAffinityPolicy:
2021+
description: |-
2022+
NodeAffinityPolicy indicates how we will treat Pod's nodeAffinity/nodeSelector
2023+
when calculating pod topology spread skew. Options are:
2024+
- Honor: only nodes matching nodeAffinity/nodeSelector are included in the calculations.
2025+
- Ignore: nodeAffinity/nodeSelector are ignored. All nodes are included in the calculations.
2026+
2027+
If this value is nil, the behavior is equivalent to the Honor policy.
2028+
This is a beta-level feature default enabled by the NodeInclusionPolicyInPodTopologySpread feature flag.
2029+
type: string
2030+
nodeTaintsPolicy:
2031+
description: |-
2032+
NodeTaintsPolicy indicates how we will treat node taints when calculating
2033+
pod topology spread skew. Options are:
2034+
- Honor: nodes without taints, along with tainted nodes for which the incoming pod
2035+
has a toleration, are included.
2036+
- Ignore: node taints are ignored. All nodes are included.
2037+
2038+
If this value is nil, the behavior is equivalent to the Ignore policy.
2039+
This is a beta-level feature default enabled by the NodeInclusionPolicyInPodTopologySpread feature flag.
2040+
type: string
2041+
topologyKey:
2042+
description: |-
2043+
TopologyKey is the key of node labels. Nodes that have a label with this key
2044+
and identical values are considered to be in the same topology.
2045+
We consider each <key, value> as a "bucket", and try to put balanced number
2046+
of pods into each bucket.
2047+
We define a domain as a particular instance of a topology.
2048+
Also, we define an eligible domain as a domain whose nodes meet the requirements of
2049+
nodeAffinityPolicy and nodeTaintsPolicy.
2050+
e.g. If TopologyKey is "kubernetes.io/hostname", each Node is a domain of that topology.
2051+
And, if TopologyKey is "topology.kubernetes.io/zone", each zone is a domain of that topology.
2052+
It's a required field.
2053+
type: string
2054+
whenUnsatisfiable:
2055+
description: |-
2056+
WhenUnsatisfiable indicates how to deal with a pod if it doesn't satisfy
2057+
the spread constraint.
2058+
- DoNotSchedule (default) tells the scheduler not to schedule it.
2059+
- ScheduleAnyway tells the scheduler to schedule the pod in any location,
2060+
but giving higher precedence to topologies that would help reduce the
2061+
skew.
2062+
A constraint is considered "Unsatisfiable" for an incoming pod
2063+
if and only if every possible node assignment for that pod would violate
2064+
"MaxSkew" on some topology.
2065+
For example, in a 3-zone cluster, MaxSkew is set to 1, and pods with the same
2066+
labelSelector spread as 3/1/1:
2067+
| zone1 | zone2 | zone3 |
2068+
| P P P | P | P |
2069+
If WhenUnsatisfiable is set to DoNotSchedule, incoming pod can only be scheduled
2070+
to zone2(zone3) to become 3/2/1(3/1/2) as ActualSkew(2-1) on zone2(zone3) satisfies
2071+
MaxSkew(1). In other words, the cluster can still be imbalanced, but scheduler
2072+
won't make it *more* imbalanced.
2073+
It's a required field.
2074+
type: string
18992075
tailscale:
19002076
description: |-
19012077
TailscaleConfig contains options to configure the tailscale-specific

0 commit comments

Comments
 (0)