Skip to content

Commit fdf0a83

Browse files
committed
migrate Argo support for multiple SAs doc
1 parent 4570aaf commit fdf0a83

File tree

3 files changed

+254
-0
lines changed

3 files changed

+254
-0
lines changed
103 KB
Loading
156 KB
Loading
Lines changed: 254 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,254 @@
1+
# Argo CD/GitOps Service support for multiple service accounts on a single cluster
2+
3+
### Written by
4+
- Jonathan West (@jgwest)
5+
- Originally written January 4th, 2023
6+
7+
## Introduction 
8+
9+
In Stonesoup, we plan to give users their own namespaces on a cluster. The API CRs of the various Stonesoup services, for that particular user, will then be created inside the user's namespace.
10+
11+
For example, as a user of Stonesoup, I might have a namespace 'jonwest' (based on my RH SSO username), and in that namespace will be Application CRs, Component CRs, Environment CRs, and so on. Those CRs would be used for the application development and deployment I wish to do from that namespace.
12+
13+
However, we (Stonesoup) would like users to be able to deploy (using GitOps Service/Argo CD) to multiple Namespaces of a cluster, with that cluster either being the same cluster as the API namespace, or a different cluster.
14+
15+
(The cluster containing the API CRs, and the deployment cluster, are presumed to be the same here, for illustrative purposes.).
16+
17+
For example, on a single cluster, a user ‘jonwest’ would have access to:
18+
19+
- A 'jonwest-dev' namespace for development environment
20+
21+
- A 'jonwest-staging' namespace for staging environment, for pre-production testing
22+
23+
- A 'jonwest-production' namespace for production
24+
25+
This requires a one-to-many relationship between users and namespaces: A user has access to multiple namespaces on a cluster, and should likewise be able to use the GitOps Service (and, indirectly, Argo CD) to deploy to those namespaces.
26+
27+
![](SA-Diagram-1.jpg)
28+
29+
The current proposed solution for controlling user access is known as SpaceRequests, and is based on _ServiceAccounts_. This works with the [Environment Provisioning functionality](https://github.com/redhat-appstudio/book/blob/main/ADR/0008-environment-provisioning.md).
30+
31+
32+
# SpaceRequests and Environments
33+
34+
In this model (I’m paraphrasing), a ServiceAccount will exist that is configured with the specific permissions that a user has access to on a given cluster.
35+
36+
So if a user has access to namespaces jonwest-dev, jonwest-staging, and jonwest-production, then a ServiceAccount+Role+RoleBindings will exist that allows the user to create resources in these namespaces using that ServiceAccount.
37+
38+
We would like to (re)use those same ServiceAccounts with the GitOps Service, and Argo CD, to deploy to clusters.
39+
40+
![](SA-Diagram-2.jpg)
41+
42+
43+
However, the problem is that Argo CD cannot deploy to the same cluster (URL) via multiple cluster secrets. If multiple Argo CD cluster secrets exists for a single cluster, Argo CD will not properly handle them (in fact, it will arbitrarily switch between them: if you watch in the UI, you will see the value swap back and forth wildly.)
44+
45+
- <https://github.com/argoproj/argo-cd/pull/10897>
46+
47+
- <https://github.com/argoproj/argo-cd/issues/2288>
48+
49+
- <https://github.com/argoproj/argo-cd/issues/4928>
50+
51+
- <https://github.com/argoproj/argo-cd/issues/5275>
52+
53+
- <https://github.com/argoproj/argo-cd/issues/9515>
54+
55+
So the question is: is it possible for us to still use these per-user ServiceAccounts to deploy with Argo CD?
56+
57+
Note: In the diagram I have all the namespaces on the same cluster, but one the namespaces may also be on other clusters.
58+
59+
60+
# Possible Solutions
61+
62+
## Options that preserve the use of per-user ServiceAccounts
63+
64+
_These options preserve the use of per-user ServiceAccounts._
65+
66+
67+
### Option A: GITOPSRVCE-240 - Decouple control plane and application sync privileges
68+
69+
Potentially, an impersonation-style feature like [GITOPSRVCE-240](https://issues.redhat.com/browse/GITOPSRVCE-240) could be used to allow Argo CD to support multiple users on a single cluster. This appears to be an oft-requested feature, but the devil is in the details.
70+
71+
**Advantages:**
72+
73+
- If designed as needed, we would be able to configure Argo CD to use the ServiceAccounts from SpaceRequest API.
74+
75+
**Disadvantages:**
76+
77+
- Not currently implemented, might not fit within the timeframe we need.
78+
79+
- Requires upstream agreement on specific solution
80+
81+
82+
### Option B: An Argo CD instance per user
83+
84+
Giving each user their own full Argo CD instance, and configuring it based on the permissions they have, should allow us to avoid this issue. 
85+
86+
**Advantages:**
87+
88+
- Since a user is only ever going to have a single set of permissions on a particular cluster, if we give each user their Argo CD, this problem is resolved.
89+
90+
- Doesn’t not require an upstream feature to be implemented, and shipped in GitOps Service
91+
92+
- The math on the number of pods supported on K8s appears to work out, see Jann’s doc for details.
93+
94+
**Disadvantages:**
95+
96+
- We currently don't have support for creating/maintaining an Argo CD instance per user.
97+
98+
- This is also a potentially large number of Argo CD instances that we would need to manage. For example, the DevSandbox product currently has 3000ish users. If we had a similar number of users for this service, that would be 3000 Argo CD instances to manage.
99+
100+
- We may need to also make changes to the OpenShift GitOps Operator to allow it to scale to handling a large number of ArgoCD operands
101+
102+
- As a corollary, we lose the advantages of sharing Argo CD instances between users (memory efficiency)
103+
104+
105+
### Option C1: Some kind of HTTP proxy that "fakes" a cluster URL for a particular user/service account
106+
107+
For example, Argo CD may treat these two clusters as different, even if they both redirect/reverse proxy to the same actual cluster:
108+
109+
-  https\://actual-cluster-url?user=jonwest
110+
111+
-  https\://actual-cluster-url?user=wtam
112+
113+
For example, https\://actual-cluster-url?param=(user) would reverse proxy to https\://actual-cluster-url (an actual k8s cluster), but from the perspective of Argo CD it might see each URL as different due to the ?param= part of the URL.
114+
115+
I don't actually know if this would even work, I'm just brainstorming here.
116+
117+
Another example of this would be a DNS rule with https\://\*.cluster-api-url.com which would resolve to https\://cluster-api-url.com for all '\*' values. You could then put users in the subhost field, e.g. https\://jonwest.cluster-api-url.com, and Argo CD could potentially treat them as separate clusters.
118+
119+
**Advantages:**
120+
121+
- Might not require upstream feature
122+
123+
- Go to community meeting and “take the temperature” of the idea, see how folks feel about this behaviour
124+
125+
- Jann mentions that other folks in the wild are already using this as a pattern
126+
127+
**Disadvantages:**
128+
129+
- Hacky, or at least approaching hacky, depending on solution
130+
131+
- In order to reduce the “hackiness”, we would need to add upstream unit/E2E tests to Argo CD, to prevent others from regressing this desired behaviour.
132+
133+
- Not necessarily portable between different K8s cluster network/router configurations
134+
135+
136+
### Option C2: Add Query Parameter URL to Argo CD URLs
137+
138+
This is very similar to Option C1, except that no MITM proxy or DNS is used. Instead, we merely add a query parameter to cluster URLs in Argo CD Secrets.
139+
140+
For example, Argo CD will treat these two clusters as different, even if they both share the same base URL:
141+
142+
- Scoped by username
143+
144+
- _https\://actual-cluster-url?user=jonwest_
145+
146+
- _https\://actual-cluster-url?user=wtam_
147+
148+
- Scoped by managed environment uid
149+
150+
- _https\://actual-cluster-url?managedEnvironment=(uuid of managed env in jonwest namespace)_
151+
152+
- _https\://actual-cluster-url?user=(uuid of managed end in wtam namespace)_
153+
154+
To be clear: we are adding an arbitrary new field which is not supported by K8s.
155+
156+
This is my proposed solution. See ‘Jonathan’s Proposed Solution’ below for details on this.
157+
158+
159+
160+
**Advantages:**
161+
162+
- Jann spoke with the Argo CD community to take the temperature of adding support for this feature
163+
164+
- He notes: I took the "query parameters to cluster URLs to make them unique" use case to today's contributors meeting, and it's general consensus that we want to support this.
165+
166+
- Likewise Jann and [Jonathan](https://github.com/redhat-appstudio/managed-gitops/compare/main...jgwest:managed-gitops:argocd-behaviour-test-feb-2023?expand=1) tested it, and it works
167+
168+
- Jann also mentioned previously that other folks in the wild are already using this as a pattern
169+
170+
- Likely does not require an upstream feature proposal
171+
172+
- We can instead implement upstream unit/E2E tests to ensure Argo CD does not regress this behaviour.
173+
174+
**Disadvantages:**
175+
176+
- We are relying on Kubernetes API cluster URL behaviour for which we do not know how stable it is.
177+
178+
- For example, what happens if K8s API server decides that arbitrary queryParameters on cluster URLs should be rejected?
179+
180+
- Or, what if we encounter a proxy between the cluster, and us, which doesn’t like the query parameters?
181+
182+
- Argo CD may change the behaviour on us, but contributing unit/E2E tests (as described above) should successfully mitigate this.
183+
184+
185+
## Options that abandon the use of per-user ServiceAccounts
186+
187+
_These options preserve the use of per-user ServiceAccounts, which may substantially affect the feasibility of Space Request/Environment proposals._
188+
189+
190+
### Option D: Give Argo CD full access to every cluster, and use AppProjects to maintain user boundaries in Argo CD
191+
192+
For this option, we abandon the user of ServiceAccounts for each user. 
193+
194+
We would instead need to find some way to translate SpaceRequests -> AppProjects, and add AppProject-style support to GitOps Service. The feasibility of this is unknown.
195+
196+
197+
198+
# Jonathan’s Proposal
199+
200+
## Short term solution - Implement option C2: Add Query Parameter URL to Argo CD URLs in the GitOps Service
201+
202+
We need a solution for this that will allow us to meet our service preview goals. STONE-114 is a blocker for Service Preview, and is thus essential.
203+
204+
I thus propose we implement option C2, as above. 
205+
206+
The only challenge here is that it relies on the behaviour of Kubernetes and Argo CD API URL behaviour, in order to ensure correctness. 
207+
208+
- We can mitigate that by contributing unit/E2E tests to Argo CD, and GitOps Service, to ensure this behaviour is not regressed. 
209+
210+
- We should also pursue a longer term solution, see below.
211+
212+
In the GitOps Service itself, the fix is straightforward: When creating a new Argo CD cluster secret, we would append the UID of the managed environment, like so:
213+
214+
215+
```yaml
216+
kind: Secret
217+
metadata:
218+
name: managed-env-(uid of managed environment row)
219+
namespace: gitops-service-argocd
220+
labels:
221+
argocd.argoproj.io/secret-type: cluster
222+
databaseID: (uid of managed environment row)
223+
data:
224+
name: managed-env-(uid of managed environment row)
225+
226+
server: "http://(cluster url provided by user)?managedEnv=(uid of managed environment row)" # <=== this field has the new behaviour: query parameter is new
227+
228+
config: "( json config data)"
229+
```
230+
231+
In the above example, the new behaviour is the ‘?managedEnv=’ query parameter on the .data.server field.
232+
233+
This should ensure that Argo CD treats each user’s ‘slice’ of the cluster as a separate cluster.
234+
235+
Presuming we are fine with this solution, the following stories will be added to the epic, for implementation:
236+
237+
- A) Contribute unit/E2E tests upstream to Argo CD/gitops-engine repos, to verify/prevent regressions of the desired cluster API behaviour
238+
239+
- B) Contribute unit/E2E tests to GitOps Service, to verify/prevent regressions of the desired cluster API behaviour
240+
241+
- C) Implement logic in cluster-agent component of GitOps Service to add the above managedEnv field to Argo CD cluster secrets
242+
243+
244+
## Medium/long term solution: Open an epic to decide on either more Argo CD native solution (e.g. impersonation), or GitOps Service native solution (per user Argo CD instances)
245+
246+
PMs, Architects, Stakeholders et al to determine what is a good permanent solution: one that does not depend on Kubernetes API URL parsing behaviour, but rather on either a proper Argo CD feature (e.g. impersonation), a GitOps Service feature (a dedicated Argo CD instance per API namespace), or some combination.
247+
248+
Examples of a potential solution that would be implemented by this epic would include **Option A**, and **Option B**, above. 
249+
250+
- Speaking personally, from a technical perspective, I’m happy with either.
251+
252+
- They both have advantages and tradeoffs that should be balanced based on business needs, scalability, and upstream interests.
253+
254+
This epic would be opened to track determining and implementing that more permanent solution. PMs to prioritize implementation priority.

0 commit comments

Comments
 (0)