You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/azure-arc/kubernetes/conceptual-workload-management.md
+14-14Lines changed: 14 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,25 +1,25 @@
1
1
---
2
2
title: "Workload management in a multi-cluster environment with GitOps"
3
3
description: "This article provides a conceptual overview of the workload management in a multi-cluster environment with GitOps."
4
-
ms.date: 03/13/2023
4
+
ms.date: 03/29/2023
5
5
ms.topic: conceptual
6
6
author: eedorenko
7
7
ms.author: iefedore
8
8
---
9
9
10
10
# Workload management in a multi-cluster environment with GitOps
11
11
12
-
Developing modern cloud-native applications often includes building, deploying, configuring, and promoting workloads across a fleet of Kubernetes clusters. With the increasing diversity of Kubernetes clusters in the fleet, and the variety of applications and services, the process can become complex and unscalable. Enterprise organizations can be more successful in these efforts by having a well defined structure that organizes people and their activities, and by using automated tools.
12
+
Developing modern cloud-native applications often includes building, deploying, configuring, and promoting workloads across a group of Kubernetes clusters. With the increasing diversity of Kubernetes cluster types, and the variety of applications and services, the process can become complex and unscalable. Enterprise organizations can be more successful in these efforts by having a well defined structure that organizes people and their activities, and by using automated tools.
13
13
14
14
This article walks you through a typical business scenario, outlining the involved personas and major challenges that organizations often face while managing cloud-native workloads in a multi-cluster environment. It also suggests an architectural pattern that can make this complex process simpler, observable, and more scalable.
15
15
16
16
## Scenario overview
17
17
18
-
This article describes an organization that develops cloud-native applications. Any application needs a compute resource to work on. In the cloud-native world, this compute resource is a Kubernetes cluster. An organization may have a single cluster or, more commonly, multiple clusters. So the organization must decide which applications should work on which clusters. In other words, they must schedule the applications across clusters. The result of this decision, or scheduling, is a model of the desired state of their cluster fleet. Having that in place, they need somehow to deliver applications to the assigned clusters so that they can turn the desired state into the reality, or, in other words, reconcile it.
18
+
This article describes an organization that develops cloud-native applications. Any application needs a compute resource to work on. In the cloud-native world, this compute resource is a Kubernetes cluster. An organization may have a single cluster or, more commonly, multiple clusters. So the organization must decide which applications should work on which clusters. In other words, they must schedule the applications across clusters. The result of this decision, or scheduling, is a model of the desired state of the clusters in their environment. Having that in place, they need somehow to deliver applications to the assigned clusters so that they can turn the desired state into the reality, or, in other words, reconcile it.
19
19
20
20
Every application goes through a software development lifecycle that promotes it to the production environment. For example, an application is built, deployed to Dev environment, tested and promoted to Stage environment, tested, and finally delivered to production. For a cloud-native application, the application requires and targets different Kubernetes cluster resources throughout its lifecycle. In addition, applications normally require clusters to provide some platform services, such as Prometheus and Fluentbit, and infrastructure configurations, such as networking policy.
21
21
22
-
Depending on the application, there may be a great diversity of cluster types to which the application is deployed. The same application with different configurations could be hosted on a managed cluster in the cloud, on a connected cluster in an on-premises environment, on a fleet of clusters on semi-connected edge devices on factory lines or military drones, and on an air-gapped cluster on a starship. Another complexity is that clusters in early lifecycle stages such as Dev and QA are normally managed by the developer, while reconciliation to actual production clusters may be managed by the organization's customers. In the latter case, the developer may be responsible only for promoting and scheduling the application across different rings.
22
+
Depending on the application, there may be a great diversity of cluster types to which the application is deployed. The same application with different configurations could be hosted on a managed cluster in the cloud, on a connected cluster in an on-premises environment, on a group of clusters on semi-connected edge devices on factory lines or military drones, and on an air-gapped cluster on a starship. Another complexity is that clusters in early lifecycle stages such as Dev and QA are normally managed by the developer, while reconciliation to actual production clusters may be managed by the organization's customers. In the latter case, the developer may be responsible only for promoting and scheduling the application across different rings.
23
23
24
24
## Challenges at scale
25
25
@@ -28,7 +28,7 @@ In a small organization with a single application and only a few operations, mos
28
28
The following capabilities are required to perform this type of workload management at scale in a multi-cluster environment:
29
29
30
30
- Separation of concerns on scheduling and reconciling
31
-
- Promotion of the fleet state through a chain of environments
31
+
- Promotion of the multi-cluster state through a chain of environments
32
32
- Sophisticated, extensible and replaceable scheduler
33
33
- Flexibility to use different reconcilers for different cluster types depending on their nature and connectivity
34
34
@@ -38,22 +38,22 @@ Before we describe the scenario, let's clarify which personas are involved, what
38
38
39
39
### Platform team
40
40
41
-
The platform team is responsible for managing the fleet of clusters that hosts applications produced by application teams.
41
+
The platform team is responsible for managing the clusters that host applications produced by application teams.
* Define cluster types in the fleet and their distribution across environments.
46
+
* Define cluster types and their distribution across environments.
47
47
* Provision new clusters.
48
-
* Manage infrastructure configurations across the fleet.
48
+
* Manage infrastructure configurations across the clusters.
49
49
* Maintain platform services used by applications.
50
50
* Schedule applications and platform services on the clusters.
51
51
52
52
### Application team
53
53
54
54
The application team is responsible for the software development lifecycle (SDLC) of their applications. They provide Kubernetes manifests that describe how to deploy the application to different targets. They're responsible for owning CI/CD pipelines that create container images and Kubernetes manifests and promote deployment artifacts across environment stages.
55
55
56
-
Typically, the application team has no knowledge of the clusters that they are deploying to. They aren't aware of the structure of the fleet, global configurations, or tasks performed by other teams. The application team primarily understands the success of their application rollout as defined by the success of the pipeline stages.
56
+
Typically, the application team has no knowledge of the clusters that they are deploying to. They aren't aware of the structure of the multi-cluster environment, global configurations, or tasks performed by other teams. The application team primarily understands the success of their application rollout as defined by the success of the pipeline stages.
57
57
58
58
Key responsibilities of the application team are:
59
59
@@ -88,7 +88,7 @@ Let's have a look at the high level solution architecture and understand its pri
88
88
89
89
### Control plane
90
90
91
-
The platform team models the fleet in the control plane. It's designed to be human-oriented and easy to understand, update, and review. The control plane operates with abstractions such as Cluster Types, Environments, Workloads, Scheduling Policies, Configs and Templates. These abstractions are handled by an automated process that assigns deployment targets and configuration values to the cluster types, then saves the result to the platform GitOps repository. Although the entire fleet may consist of thousands of physical clusters, the platform repository operates at a higher level, grouping the clusters into cluster types.
91
+
The platform team models the multi-cluster environment in the control plane. It's designed to be human-oriented and easy to understand, update, and review. The control plane operates with abstractions such as Cluster Types, Environments, Workloads, Scheduling Policies, Configs and Templates. These abstractions are handled by an automated process that assigns deployment targets and configuration values to the cluster types, then saves the result to the platform GitOps repository. Although there may be thousands of physical clusters, the platform repository operates at a higher level, grouping the clusters into cluster types.
92
92
93
93
The main requirement for the control plane storage is to provide a reliable and secure transaction processing functionality, rather than being hit with complex queries against a large amount of data. Various technologies may be used to store the control plane data.
94
94
@@ -129,13 +129,13 @@ Every cluster type can use a different reconciler (such as Flux, ArgoCD, Zarf, R
129
129
130
130
### Platform services
131
131
132
-
Platform services are workloads (such as Prometheus, NGINX, Fluentbit, and so on) maintained by the platform team. Just like any workloads, they have their source repositories and manifests storage. The source repositories may contain pointers to external Helm charts. CI/CD pipelines pull the charts with containers and perform necessary security scans before submitting them to the manifests storage, from where they're reconciled to the clusters in the fleet.
132
+
Platform services are workloads (such as Prometheus, NGINX, Fluentbit, and so on) maintained by the platform team. Just like any workloads, they have their source repositories and manifests storage. The source repositories may contain pointers to external Helm charts. CI/CD pipelines pull the charts with containers and perform necessary security scans before submitting them to the manifests storage, from where they're reconciled to the clusters.
133
133
134
134
### Deployment Observability Hub
135
135
136
-
Deployment Observability Hub is a central storage that is easy to query with complex queries against a large amount of data. It contains deployment data with historical information on workload versions and their deployment state across clusters in the fleet. Clusters register themselves in the storage and update their compliance status with the GitOps repositories. Clusters operate at the level of Git commits only. High-level information, such as application versions, environments, and cluster type data, is transferred to the central storage from the GitOps repositories. This high-level information gets correlated in the central storage with the commit compliance data sent from the clusters.
136
+
Deployment Observability Hub is a central storage that is easy to query with complex queries against a large amount of data. It contains deployment data with historical information on workload versions and their deployment state across clusters. Clusters register themselves in the storage and update their compliance status with the GitOps repositories. Clusters operate at the level of Git commits only. High-level information, such as application versions, environments, and cluster type data, is transferred to the central storage from the GitOps repositories. This high-level information gets correlated in the central storage with the commit compliance data sent from the clusters.
137
137
138
138
## Next steps
139
139
140
-
*Explore a [sample implementation of workload management in a multi-cluster environment with GitOps](https://github.com/microsoft/kalypso).
141
-
*Try our [Tutorial: Workload Management in Multi-cluster environment with GitOps](tutorial-workload-management.md) to walk through the implementation.
140
+
*Walk through a sample implementation to explore [workload management in a multi-cluster environment with GitOps](workload-management.md).
141
+
*Explore a [multi-cluster workload management sample repository](https://github.com/microsoft/kalypso).
0 commit comments