Skip to content

Commit 0d07761

Browse files
authored
docs: Finish and update the Argo CD Integration patterns docs (#521)
Signed-off-by: jannfis <[email protected]>
1 parent bb90158 commit 0d07761

File tree

1 file changed

+78
-25
lines changed

1 file changed

+78
-25
lines changed

docs/concepts/argocd-integration.md

Lines changed: 78 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -2,53 +2,106 @@
22

33
## Overview
44

5-
*argocd-agent* does not replace Argo CD, but instead integrates with it. There are several ways of integrating Argo CD and *argocd-agent*, each with their own pros and cons. This chapter gives an overview over the integration pattern that most people will likely want to use.
5+
The *argocd-agent* is designed to complement, not replace, Argo CD by extending its capabilities to multi-cluster environments. It provides two distinct integration patterns that allow you to balance resource efficiency, operational autonomy, and resilience based on your specific requirements.
66

7-
In the below diagrams,
7+
This integration enables centralized GitOps management while distributing the actual workload execution across multiple Kubernetes clusters. The agent acts as a bridge between your central control plane (where configuration and observability are managed) and your workload clusters (where applications are deployed and managed).
88

9-
* light green boxes are parts of *argocd-agent*
10-
* light blue boxes are parts of Argo CD and
11-
* light red boxes are external systems and components
9+
### Diagram Legend
1210

13-
Components drawn with a dotted outline indicate their location depend on the [operational mode](./agent-modes/index.md) of the agent.
11+
In the diagrams throughout this document:
1412

15-
!!! warning "Choosing which integration pattern to use"
16-
While it is possible to run agents with different operational modes connecting to the same control plane cluster, it is not (yet?) possible to have your workload clusters using different integration modes. Choosing which integration pattern to use is an architectural decision affecting the whole environment, spanning from the control plane cluster to each and every workload cluster. It will not be possible to switch between the two integration patterns without service interruption.
13+
* **Light green boxes** represent *argocd-agent* components
14+
* **Light blue boxes** represent Argo CD components
15+
* **Light red boxes** represent external systems and components
16+
17+
Components with dotted outlines indicate their deployment location depends on the selected [operational mode](./agent-modes/index.md) of the agent.
18+
19+
!!! warning "Integration Pattern Selection"
20+
The choice between integration patterns is a fundamental architectural decision that affects your entire GitOps infrastructure. While agents can operate in different modes within the same environment, all workload clusters must use the same integration pattern. Switching between patterns requires service interruption and careful migration planning.
1721

1822
## Integration patterns
1923

20-
### Pattern 1: Lowest footprint workload clusters
24+
### Pattern 1: Centralized Resource Sharing (Low Footprint)
2125

22-
This integration pattern requires some of the core components of Argo CD, specifically the *repository-server* and the *redis-server* to be shared on the control plane, while the only component on the workload clusters will be the *application-controller*.
26+
This integration pattern centralizes critical Argo CD components on the control plane cluster while minimizing the footprint on workload clusters. The *repository-server* and *redis-server* components are shared across all workload clusters, with only the *application-controller* deployed locally on each workload cluster.
2327

2428
![Integration pattern 1: Low footprint spokes](../assets/02-integration-shared.png)
2529

26-
As can be seen, the only component installed and running on the *workload cluster* is the *application-controller* (and the *applicationset-controller*, in case the agent runs in autonomous mode), and the *application-controller* is talking to a *repository-server* and a *redis-server* on the control plane cluster.
30+
In this architecture, each workload cluster runs only the *application-controller* (and optionally the *applicationset-controller* when operating in autonomous mode). These controllers communicate with the centralized *repository-server* and *redis-server* on the control plane cluster for manifest rendering and state caching.
2731

2832
**Advantages of this pattern**
2933

30-
* Less compute requirements on the workload clusters, as some of the heavy lifting is done on the control plane
31-
* Since the *repository-server* runs on the control plane cluster, the workload clusters don't need access to Git. They will only need to talk to the control plane cluster.
32-
* Since more of the important state (such as, rendered manifests) is stored on the control plane cluster's *redis-server*, it is cheaper for the Argo CD API on the control plane cluster to actually access the state. However, it should be noted that most of the traffic to *redis-server* stems from the *application-controller* as well as the *repository-server*.
34+
* **Reduced resource consumption**: Workload clusters require minimal compute resources since heavy operations like Git repository processing and manifest rendering occur on the control plane
35+
* **Simplified network requirements**: Workload clusters only need connectivity to the control plane cluster, eliminating the need for direct Git repository access
36+
* **Centralized state management**: Application state and rendered manifests are stored centrally, enabling efficient API operations and reducing data duplication
37+
* **Cost efficiency**: Fewer components per workload cluster translate to lower operational costs, especially beneficial for large numbers of clusters
3338

3439
**Disadvantages of this pattern**
3540

36-
* The control plane cluster and its components become a single point of failure (SPoF) for the whole setup. If the workload cluster cannot reach the control plane cluster, or the components become unavailable, the *application-controller* on the workload clusters cannot render manifests anymore, or store important information in the cache. Reconciliation will stop working on the workload clusters.
37-
* The network traffic flowing between the workload cluster and the control plane cluster increases, potentially significantly. This might become a bottleneck, or result in higher bills depending on how your traffic is charged.
38-
* You will have to take steps for scaling of the *repository-server* and the *redis-server* workloads on the central control plane, depending on how many clusters you have, how many applications are deployed to them and how often they reconcile.
39-
* You will have to manage additional ingress points on the central control plane, along with credentials for each
41+
* **Single point of failure**: The control plane cluster becomes critical infrastructure - if unavailable, workload clusters cannot render new manifests or update cached state, halting reconciliation processes
42+
* **Network dependency**: Increased network traffic between workload and control plane clusters may create bottlenecks or increase costs in cloud environments with inter-zone/region charges
43+
* **Centralized scaling challenges**: The *repository-server* and *redis-server* must be scaled to handle the aggregate load from all workload clusters, requiring careful capacity planning
44+
* **Complex ingress management**: Additional network ingress points and credential management are required on the control plane for each workload cluster connection
45+
* **Limited fault isolation**: Issues with shared components affect all connected workload clusters simultaneously
4046

41-
### Pattern 2: Fully autonomous workload clusters
47+
### Pattern 2: Fully Autonomous Workload Clusters
4248

43-
This integration pattern also outsources the Argo CD *repository-server* and *redis-server* components in addition to the *application-controller* to the workload clusters. This pattern makes each workload cluster effectively an autonomous Argo CD installation, minus the configuration and observability aspects - which are provided on the central control plane.
49+
This integration pattern deploys a complete Argo CD stack (*application-controller*, *repository-server*, and *redis-server*) on each workload cluster, creating fully autonomous GitOps environments. Each workload cluster operates as an independent Argo CD instance while maintaining centralized configuration management and observability through the control plane.
4450

4551
![Integration pattern 2: Autonomous spokes](../assets/02-integration-autonomous.png)
4652

53+
This architecture enables workload clusters to perform all GitOps operations locally, including Git repository access, manifest rendering, and state management. The control plane cluster serves primarily as a configuration hub and observability aggregation point.
54+
4755
**Advantages of this pattern**
4856

49-
* Workload clusters become truly autonomous in their operations, while only configuration and observability will be affected when the control plane cluster becomes unavailable or has problems. With agents also operating in [autonomous mode](./agent-modes/autonomous.md), only observability will be affected by an outage of the control plane cluster.
50-
* (Much) less traffic has to flow between
51-
* Scaling of all Argo CD workloads per-cluster becomes possible
52-
* Single point of ingress in the control plane cluster (the principal)
57+
* **True operational autonomy**: Workload clusters continue functioning independently during control plane outages, with only configuration updates and observability affected. When combined with [autonomous mode](./agent-modes/autonomous.md), clusters maintain full GitOps capabilities even during extended control plane unavailability
58+
* **Reduced network traffic**: Minimal communication required between workload and control plane clusters, eliminating bandwidth bottlenecks and reducing inter-cluster network costs
59+
* **Distributed scaling**: Each workload cluster scales its Argo CD components independently based on local requirements, enabling optimal resource utilization
60+
* **Simplified networking**: Single ingress point required on the control plane cluster for agent communication, reducing network complexity and security surface area
61+
* **Enhanced fault isolation**: Issues with one workload cluster's components don't affect other clusters in the environment
62+
* **Improved performance**: Local processing of Git operations and manifest rendering eliminates network latency from the GitOps workflow
63+
64+
**Disadvantages of this pattern**
65+
66+
* **Increased resource requirements**: Each workload cluster must allocate compute resources for the full Argo CD stack, increasing the minimum viable cluster size
67+
* **Git repository access**: Workload clusters require direct network connectivity to Git repositories, potentially complicating network security policies and firewall configurations
68+
* **Distributed state management**: Application state is distributed across workload clusters, making centralized monitoring and troubleshooting more complex
69+
* **Higher operational complexity**: Managing and maintaining Argo CD components across multiple clusters increases operational overhead
70+
* **Resource duplication**: Git repository caching and manifest rendering occur independently on each cluster, potentially leading to redundant resource usage
71+
* **Security considerations**: Each workload cluster needs credentials for Git repository access, expanding the credential management scope
72+
73+
## Recommendation
74+
75+
**We strongly recommend using Pattern 2 (Fully Autonomous Workload Clusters) for most production environments**, except when operating under severe compute resource constraints.
76+
77+
The autonomous pattern provides significant operational benefits that outweigh its resource overhead in most scenarios:
78+
79+
### Why Choose Autonomous Clusters
80+
81+
1. **Operational Resilience**: Your GitOps workflows continue functioning during control plane maintenance, upgrades, or unexpected outages. This is crucial for production environments where application deployments cannot be delayed.
82+
83+
2. **Performance and Reliability**: Eliminating network dependencies from the GitOps workflow reduces latency, potential network failures, and bandwidth costs. Local processing ensures consistent performance regardless of control plane load.
84+
85+
3. **Scalability**: Each cluster scales independently, avoiding the complex capacity planning required for centralized components that must handle aggregate loads from all workload clusters.
86+
87+
4. **Operational Independence**: Teams can manage their workload clusters with greater autonomy, reducing dependencies on central infrastructure teams.
88+
89+
### When to Consider Centralized Resource Sharing
90+
91+
Pattern 1 (Centralized Resource Sharing) may be appropriate in the following scenarios:
92+
93+
* **Resource-constrained environments**: Edge computing deployments, IoT clusters, or environments where compute resources are severely limited
94+
* **Development and testing environments**: Where operational resilience is less critical than resource efficiency
95+
* **Highly regulated environments**: Where centralized control and reduced attack surface are prioritized over operational autonomy
96+
* **Small-scale deployments**: With fewer than 5-10 workload clusters where the operational overhead of distributed management outweighs the benefits
97+
98+
### Implementation Considerations
99+
100+
When implementing the autonomous pattern:
101+
102+
* Ensure workload clusters have adequate resources for the full Argo CD stack (typically requiring an additional 1-2 CPU cores and 2-4GB RAM)
103+
* Plan for Git repository access from all workload clusters, including necessary network policies and credentials
104+
* Implement monitoring and alerting for distributed Argo CD components
105+
* Consider using [autonomous mode](./agent-modes/autonomous.md) to maximize resilience during control plane outages
53106

54-
**Disadvantages of this pattern**
107+
The additional resource investment in autonomous clusters typically pays dividends through improved reliability, performance, and operational flexibility in production environments.

0 commit comments

Comments
 (0)