-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Create Position around Hub Cluster (prev. management Cluster) #8210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 1 commit
f383444
90a99eb
d0fbe3d
f5dc455
74b2d90
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,46 @@ | ||||||
# Management Cluster - SIG Multicluster Position Statement | ||||||
|
||||||
|
||||||
Author: Corentin Debains (**[@corentone](https://github.com/corentone)**), Google | ||||||
Last Edit: 2024/12/09 | ||||||
Status: DRAFT | ||||||
|
||||||
## Goal | ||||||
To establish a standard definition for a central cluster that is leveraged by multicluster | ||||||
controllers to manage multicluster applications or features across an inventory of clusters. | ||||||
|
||||||
## Context | ||||||
Multicluster controllers have always needed a place to run. This may happen in external | ||||||
proprietary control-planes but for more generic platforms, it has been natural for the | ||||||
Kubernetes community to leverage a Kubernetes Cluster and the existing api-machinery | ||||||
available. There has been a variety of examples of which we can quote ArgoCD, MultiKueue | ||||||
or any of the Federation effort (Karmada, KubeAdmiral), all of them not-naming the "location" | ||||||
where they run or not aligning on the name (Admin cluster, Hub Cluster, Manager Cluster...). | ||||||
corentone marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
The [ClusterInventory](https://github.com/kubernetes/enhancements/blob/master/keps/sig-multicluster/4322-cluster-inventory/README.md) | ||||||
(ClusterProfile CRDs) is also the starting point for a lot of multicluster controllers and, | ||||||
being a CRD, it requires an api-machinery to host it. | ||||||
|
||||||
## Definition | ||||||
|
||||||
A (multicluster) management cluster is a Kubernetes cluster that acts as a | ||||||
control-plane for other Kubernetes clusters (named Workload Clusters to differentiate | ||||||
them). It MUST have visibility over the available clusters and MAY have administrative | ||||||
|
them). It MUST have visibility over the available clusters and MAY have administrative | |
them). It MUST have the ClusterProfiles written on it MAY have visibility over the available clusters and have administrative |
(I could also drop the clusterprofile part as obvious)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for revisiting. I'm aligned with MAY.
Specific features could require this visibility, but not the general use case of a hub cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my systems, the workload clusters pull from the management cluster. Controllers run on the workload clusters so MAY have administrative privileges works well in my world.
Visible is interesting for my world; my management clusters are authoritative for the data and the workload clusters reconcile against it. But, there MAY be two management clusters from which a workload cluster pulls (i.e. a dev and prod source-of-truth); each management cluster authoritative for particular data.
Lastly, could visibility be decomposed into data storage about a workload cluster and mechanism for transport/reconciliation between the hub/workload cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed visibility to be "access to api, metrics or workloads". So it could be that it doesn't access any of those or a subset of them.
@cbaenziger I'm not sure what you mean about data storage for visibility, apiserver/etcd?
Visible is interesting for my world; my management clusters are authoritative for the data and the workload clusters reconcile against it. But, there MAY be two management clusters from which a workload cluster pulls (i.e. a dev and prod source-of-truth); each management cluster authoritative for particular data.
we should talk more about this example. There could be multiple multiple management clusters for a given cluster, and the definition allows for it.
Now putting my infra hat (not PR writer), given the current isolation of kubernetes between workloads, I would prefer to mix Business Units with a similar risk) instead of different environments (prod vs dev).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I agree with definition as written now. To clarify my operation:
Lastly, could visibility be decomposed into data storage about a workload cluster and mechanism for transport/reconciliation between the hub/workload cluster?
Data: My hub clusters store data in their etcd (kine based) about the workload clusters. However, they are not responsible for propagating that configuration data to the workload clusters.
Transport: The workload clusters run a controller which polls (an interposing service backed by the) hub cluster and synchronizes specific resources to the workload cluster. The hub cluster is not responsible for the propagation of the state it stores.
As to multiple hub clusters: Since workload clusters are responsible for synchronizing their resources from the hub clusters, one hub cluster could be responsible for storing a resource that drives namespace definitions. Another could be responsible for storing RBAC policy definitions. Or as mentioned, one could hold dev resources and prod resources -- I fully agree with your operator perspective that no workload cluster ought to sync from both a dev and prod system -- merely that each is authoritative for its tranche of data; see the namespace vs RBAC split for a less broken operational example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cbaenziger thank you for the details. Indeed your model is similar to the WorkAPI (where there is an agent running in the workload cluster pulling data from the hub cluster).
I agree with the statement on multiple clusters.
One thing I'd note is that even if you "pull" the data from the hub cluster, the hub cluster still holds permissions (although indirectly) on the remote clusters since it could technically go wild and "tell" the workload-cluster-pullers that they need to run compromised code. It wouldn't be able to DDOS the apiserver but if the puller pulled RBAC and not restricted to a few namespaces, it could escalate privileges easily.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not convinced of this "SHOULD". I am not convinced that this sort of statement belongs in a definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if you are not convinced that the "hub" cluster and a "spoke" cluster should not be the same or just "should" does not belong to a "definition"? I am curious about the reason if it's the former.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MikeSpreitzer could you clarify your statement please? If you mean that the management cluster relationship to workload clusters should be different than what the definition says, could you clarify your position (they MUST isolated, the definition shouldn't talk about their relationship or the SHOULD is too strong and it should be "Management cluster MAY also be a workload cluster")
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This statement supposes multiplicity only in the form of potentially competing equals; it omits the possibility of clusters fulfilling distinct roles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to clarify in the next section that it could be multiple roles:
to allow for separation of functionality (security-enforcer management cluster vs
config-delivery management cluster)
is the wording not strong enough? I don't mean to close that door and thought the current wording was enough and not emitting a direction, just requiring that the admin oversees potential overlap between different management clusters. If there is no overlap, they are fine to co-exist as separate clusters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MikeSpreitzer Did that answer your comment (e.g. can I resolve). I'm not sure I fully understood it and tried my best to answer it. Please let me know if I missed it.
corentone marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be helpful to say what is meant here by "control-plane" and "data-plane".
In the call on Tuesday I heard something like the data plane is about the workload and the control plane is about controlling the workload's propagation from hub to execution clusters. (I specifically say "execution" clusters to contrast with where the authoritative description of the workload lives in the hub, which might be in a cluster or something like it. In KubeStellar we keep the hub-side authoritative description of the workload un-wrapped in what we call a "space", which something that has (at least) a cluster's API machinery.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to clarify by differentiating "Business applications" with "Infrastructure".
I steered away from execution because people think about infra pods which are being run and get confused on where they would go. I think kubestellar "execution cluster" is equal to what I'd call "workload cluster".
I think your "space" would live on a hub cluster, are you suggesting we should separate the definition of hub and space cluster? (separating infra and app-infra)
Side notes that I may add to the definition/doc if that helps and the current wording doesn't cover it:
There are two types of Workloads: Business/Product/application vs Infrastructure/Platform.
There are two personas and the configs they manage: Platform/Infra Team and Product/Application/Business teams.
I consider workload clusters as running (e.g. pods serving traffic or achieving functionality) business applications. Workload clusters may also run infrastructure applications that assist them in serving.
Hub clusters run infrastructure applications and may hold infrastructure/definitions for business applications.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense in general, but I am not convinced that there are no use cases for combining roles in one cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the first paragraph encourages different clusters for different roles actually.
This second paragraph is just about being part of the workload clusters or not.
Let me try to think of introducing the notion of "role" or something like that, as a subdivision of the broad Management Cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder what's the definition of "workload"? I usually associate them with applications but not controllers so it's okey to me to run controllers in the "central" cluster that requires leader-election.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is definitely okay to run a controller in the management cluster. A controller doesn't have to be a "Workload" and be considered part of the "control-plane".
I think its the persona that matters. If it is a platform admin -owned controller, performing management tasks, I wouldn't consider it a workload. Workload to me is an application serving actual business-logic purpose.
All in all, running those management controllers is the reason why I want to define management clusters and not just a management API. (we had discussed internally giving simply an API with machinery... but then very quickly you want to bring a controller to act on this API and look where to run it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ryanzhang-oss should I define workload more? I think a sentence in this doc may be enough.
Workload works great when opposing management, for hub maybe we need to say spoke? or is workload still good?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK, workload in k8s basically means things like deployment/daemonset/statefulset, so it can be anything. I am not sure if there is a way to say the deployment contains a controller vs a real application.
Uh oh!
There was an error while loading. Please reload this page.