|
| 1 | +# Argo CD Multitenancy for fully untrusted users of GitOps Service<a id="argo-cd-multitenancy-for-fully-untrusted-users-of-gitops-service"></a> |
| 2 | + |
| 3 | +### Written by |
| 4 | +- Jonathan West (@jgwest) |
| 5 | +- Originally written May 9th, 2022 |
| 6 | + |
| 7 | + |
| 8 | +## Table of Contents<a id="table-of-contents"></a> |
| 9 | + |
| 10 | + |
| 11 | +## Introduction<a id="introduction"></a> |
| 12 | + |
| 13 | +As [described here](initial-api-design.md), Argo CD is not necessarily built for multitenancy fully untrusted users, rather it has historically been focused on ‘partially trusted users’: |
| 14 | + |
| 15 | +- **Partially trusted users**: |
| 16 | + |
| 17 | + - Employees within an enterprise, whose identities are fully known to the enterprise |
| 18 | + |
| 19 | + - Serious consequences for abusing system resources (employment termination, or criminal charges in the worst cases). |
| 20 | + |
| 21 | +- **Fully untrusted users**: |
| 22 | + |
| 23 | + - In contrast, public cloud products (such as ours) operate in a very different context. |
| 24 | + |
| 25 | + - We don’t know our users: we only ask them for an email address (and unvalidated PI, which we do not check): we don’t (for example) ask them for credit card or validated phone number |
| 26 | + |
| 27 | + - The only consequence for abusing system resources is our terminating their account. |
| 28 | + |
| 29 | +The contextual difference between these two groups of users means that attacks are far more likely from one group, than the other. |
| 30 | + |
| 31 | + |
| 32 | +## Why Argo CD multitenancy?<a id="why-argo-cd-multitenancy"></a> |
| 33 | + |
| 34 | +Since, I suspect, most users of the GitOps Service via AppStudio (at least initially) will have only a small number of deployments, it does not make sense (from a resource perspective) to dedicate to them a full Argo CD instance. |
| 35 | + |
| 36 | +Likewise, since Argo CD mostly just sits around idling (low CPU utilization), receiving K8s watch events and occasionally checking Git, we can drive better resource utilization via shared resources between users. |
| 37 | + |
| 38 | +Thus, if there was a way to securely share Argo CD instances between multiple users, this would be the ideal. |
| 39 | + |
| 40 | +For reference, an OpenShift GitOps operator installed onto an empty clusterbot cluster consumes these resources: |
| 41 | + |
| 42 | +- Argo CD (server+appset+repo+app controller+redis): 119MB |
| 43 | + |
| 44 | +- OpenShift GitOps Misc (kam/cluster/dex) 99MB |
| 45 | + |
| 46 | +OTOH, AWS EC2 is roughly 120GB of memory per dollar per hour. |
| 47 | + |
| 48 | + |
| 49 | +## Multitenancy: possible solutions<a id="multitenancy-possible-solutions"></a> |
| 50 | + |
| 51 | +### Option A: Argo CD instances shared between multiple users<a id="option-a-argo-cd-instances-shared-between-multiple-users"></a> |
| 52 | + |
| 53 | +The best resource utilization (e.g. maximizing the number of users per dollar of RH infrastructure costs) would be achieved with multiple users (securely) sharing Argo CD instances. |
| 54 | + |
| 55 | + |
| 56 | +### Option B: Dedicated Argo CD instances per user<a id="option-b-dedicated-argo-cd-instances-per-user"></a> |
| 57 | + |
| 58 | +One option for providing Argo CD to fully untrusted users is just to give each user of the GitOps Service their own Argo CD instance. This way, in the worst case scenario, they can only corrupt or DOS their own Argo CD instance. |
| 59 | + |
| 60 | +Based on the above numbers, there would be a base overhead of about \~120MB per user. |
| 61 | + |
| 62 | + |
| 63 | +### Option C: Hybrid approach - shared Argo CD instances with dedicated components (for example, a repo server per user)<a id="option-c-hybrid-approach---shared-argo-cd-instances-with-dedicated-components-for-example-a-repo-server-per-user"></a> |
| 64 | + |
| 65 | +Another approach would be to share the Argo CD control plane and application controller components of Argo CD between users, but not the repository server. |
| 66 | + |
| 67 | +This would be one approach for mitigating malicious Git repositories, for example, but would not mitigate the per-user cost of a single Argo CD instance. |
| 68 | + |
| 69 | + |
| 70 | +## Current challenge with shared multitenancy: Repository credentials<a id="current-challenge-with-shared-multitenancy-repository-credentials"></a> |
| 71 | + |
| 72 | +### Sketch of Problem<a id="sketch-of-problem"></a> |
| 73 | + |
| 74 | + |
| 75 | +**1) User 1 adds repo credentials for a private repo:** http\://github.com/private-org/private-project.git |
| 76 | + |
| 77 | +**2) User 1 creates an Argo CD Application to deploy from that repo** |
| 78 | + |
| 79 | +- bound to an AppProject that allows them to access that repo |
| 80 | + |
| 81 | +**3) User 2 adds their own repo credentials for a private repo:** http\://github.com/private-org/private-project.git (same repo url as from step 1, but different credentials) |
| 82 | + |
| 83 | +**4) User 2 creates an Argo CD Application to deploy from that repo** |
| 84 | + |
| 85 | +- bound to an AppProject that allows them to access that repo |
| 86 | + |
| 87 | +_The problem is - what if user 2's Git credentials expire? Expected behaviour: Argo CD should no longer deploy the contents of that Git repo for them._ |
| 88 | + |
| 89 | +**5) \*However\*, this is not the actual behaviour:** Argo CD will continue to deploy the latest git repo changes via user 2's Application (e.g. to their namespace), even though their credentials are now invalid. |
| 90 | + |
| 91 | +You might say: well of course, because the AppProject still allows them to do so, which is true! But, the problem is there is no existing mechanism to invalidate a user's access to an AppProject if the repo credentials they provided are no longer valid. |
| 92 | + |
| 93 | + |
| 94 | +**Ideal behaviour:** |
| 95 | + |
| 96 | +1\) User provided some repo credentials to create an AppProject/Application |
| 97 | + |
| 98 | +2\) Those credentials are associated to an AppProject or Application (e.g. in the same way that an Application is associated with a particular a cluster secret, via the ‘target’ field of Argo CD Application) |
| 99 | + |
| 100 | +3\) As soon as those credentials are invalid, the Argo CD Application will no longer deploy. |
| 101 | + |
| 102 | + |
| 103 | +Instead, the best alternative I can think of using existing Argo CD mechanisms: |
| 104 | + |
| 105 | +1\) Poll (every X minutes) the set of repo credentials each user has provided, and if they fail, remove the repository from the AppProject of their Application. |
| 106 | + |
| 107 | +- The problem with this is there is now a significant lag between when the credentials expire, and when access is revoked. |
| 108 | + |
| 109 | +- It may also not scale to a large number of users (too much polling) |
| 110 | + |
| 111 | + |
| 112 | +### Actual GitOps Service Example<a id="actual-gitops-service-example"></a> |
| 113 | + |
| 114 | +Here’s how this same example looks using the API of the GitOps Service. |
| 115 | + |
| 116 | +**User A: User A wants to deploy a private repo to their organization.** |
| 117 | + |
| 118 | +GitOpsDeployment (equivalent to Argo CD Application): |
| 119 | + |
| 120 | + |
| 121 | +```yaml |
| 122 | +apiVersion: v1alpha1 |
| 123 | +kind: GitOpsDeployment |
| 124 | +metadata: |
| 125 | + name: my-deployment |
| 126 | +spec: |
| 127 | + |
| 128 | + # GitHub repository containing K8s resources to deploy |
| 129 | + source: |
| 130 | + repository: https://github.com/private-org/private-repo |
| 131 | + path: . |
| 132 | + revision: master |
| 133 | + |
| 134 | + # Target K8s repository |
| 135 | + destination: { } # (...) |
| 136 | + |
| 137 | + type: automated # Manual or automated, a placeholder equivalent to Argo CD syncOptions.automated field |
| 138 | +``` |
| 139 | +
|
| 140 | +
|
| 141 | +GitOpsDeploymentRepositoryCredentials: |
| 142 | +```yaml |
| 143 | +apiVersion: v1alpha1 |
| 144 | +kind: GitOpsDeploymentRepositoryCredentials |
| 145 | +metadata: |
| 146 | + Name: private-repo-creds |
| 147 | +spec: |
| 148 | + url: https://github.com/private-org/private-repo |
| 149 | + secret: private-repo-creds-secret |
| 150 | +``` |
| 151 | +
|
| 152 | +Secret: |
| 153 | +
|
| 154 | +```yaml |
| 155 | +apiVersion: v1 |
| 156 | +kind: Secret |
| 157 | +metadata: |
| 158 | + name: private-repo-creds-secret |
| 159 | +stringData: |
| 160 | + url: https://github.com/private-org/private-repo |
| 161 | + # and: |
| 162 | + username: userA |
| 163 | + password: (my password) |
| 164 | +``` |
| 165 | +
|
| 166 | +
|
| 167 | +**User B:** |
| 168 | +
|
| 169 | +User B (in their own namespace) creates the same GitOps Service API resources as user A, but with user B specifying their own GitHub credentials in the secret. |
| 170 | +
|
| 171 | +
|
| 172 | +```yaml |
| 173 | +apiVersion: v1 |
| 174 | +kind: Secret |
| 175 | +metadata: |
| 176 | + name: private-repo-creds-secret |
| 177 | +stringData: |
| 178 | + url: https://github.com/private-org/private-repo |
| 179 | + # and: |
| 180 | + username: userB |
| 181 | + password: (my password) |
| 182 | +``` |
| 183 | +
|
| 184 | +
|
| 185 | +**The GitOps service turns those GitOpsDeployments/GitOpsRepositoryCredentials/Secrets into the corresponding Argo CD Application/Repository Secret/AppProjects**: |
| 186 | +
|
| 187 | +```yaml |
| 188 | +apiVersion: v1 |
| 189 | +kind: Secret |
| 190 | +metadata: |
| 191 | + name: user-A-repo-creds |
| 192 | + namespace: argocd |
| 193 | + labels: |
| 194 | + argocd.argoproj.io/secret-type: repository |
| 195 | +stringData: |
| 196 | + type: git |
| 197 | + url: https://github.com/private-org/private-repo |
| 198 | + |
| 199 | + username: userA |
| 200 | + password: (password) |
| 201 | +``` |
| 202 | +
|
| 203 | +
|
| 204 | +```yaml |
| 205 | +apiVersion: v1 |
| 206 | +kind: Secret |
| 207 | +metadata: |
| 208 | + name: user-B-repo-creds |
| 209 | + namespace: argocd |
| 210 | + labels: |
| 211 | + argocd.argoproj.io/secret-type: repository |
| 212 | +stringData: |
| 213 | + type: git |
| 214 | + url: https://github.com/private-org/private-repo |
| 215 | + |
| 216 | + username: userB |
| 217 | + password: (password) |
| 218 | +``` |
| 219 | +
|
| 220 | +```yaml |
| 221 | +apiVersion: argoproj.io/v1alpha1 |
| 222 | +kind: Application |
| 223 | +metadata: |
| 224 | + name: userA-Application |
| 225 | + namespace: argocd |
| 226 | +spec: |
| 227 | + project: user-A-project-that-allows-access-to-private-repo |
| 228 | + source: |
| 229 | + repoURL: https://github.com/private-org/private-repo |
| 230 | + targetRevision: HEAD |
| 231 | + path: . |
| 232 | + destination: {} # ( .. user A's cluster .. ) |
| 233 | +``` |
| 234 | +
|
| 235 | +```yaml |
| 236 | +apiVersion: argoproj.io/v1alpha1 |
| 237 | +kind: Application |
| 238 | +metadata: |
| 239 | + name: userB-Application |
| 240 | + namespace: argocd |
| 241 | +spec: |
| 242 | + project: user-B-project-that-allows-access-to-private-repo |
| 243 | + source: |
| 244 | + repoURL: https://github.com/private-org/private-repo |
| 245 | + targetRevision: HEAD |
| 246 | + path: . |
| 247 | + destination: {} # ( .. user B's cluster .. ) |
| 248 | +``` |
| 249 | +
|
| 250 | +**However, we can see where this now breaks down:** |
| 251 | +
|
| 252 | +If user B’s GitHub credentials to the private organization (‘private-org’) are revoked, Argo CD will continue to successfully deploy, because there also exist credentials for another user for that repository within the workspace. |
| 253 | +
|
| 254 | +
|
| 255 | +### Sketch of similar problem<a id="sketch-of-similar-problem"></a> |
| 256 | +
|
| 257 | +1\) **User 1** adds repo credentials for a private repo: http\://github.com/private-org/private-project.git |
| 258 | +
|
| 259 | +2\) **User 1** creates an Argo CD Application to deploy from that repo |
| 260 | +
|
| 261 | +- bound to an AppProject A that allows them to access that repo |
| 262 | +
|
| 263 | +3\) **User 2** creates an Argo CD Application to deploy from that repo |
| 264 | +
|
| 265 | +- bound to an AppProject B that allows them to access that repo |
| 266 | +
|
| 267 | +Argo CD allows user 2 to access the repo, because the credentials for it are defined by user 1. |
| 268 | +
|
| 269 | +Why? As long as a user is allowed to access a repository via an AppProject, Argo CD will use whichever credential it can to connect to that repository. |
| 270 | +
|
| 271 | +Could we prevent this by having the GitOps Service require credentials for private repos? Sure, but that’s just solution D below. It’s not an Argo CD native, and will only guarantee eventual consistency of a repository access (e.g. if you make a public repo private, we will eventually restrict other users from being able to access it: once our poll detects the change and updates the appprojects). |
| 272 | +
|
| 273 | +
|
| 274 | +
|
| 275 | +### Possible solutions to the problem of shared credentials<a id="possible-solutions-to-the-problem-of-shared-credentials"></a> |
| 276 | +
|
| 277 | +1. Dedicated Argo CD Instance per user |
| 278 | +
|
| 279 | +2. Dedicated repo server per user |
| 280 | +
|
| 281 | +3. Upstream support for restricting a particular repository secret credential to a particular AppProject |
| 282 | +
|
| 283 | +4. Downstream solution: the Core GitOps Service could periodically poll every user’s repository credential, and if we detected that a user’s credential no longer worked for the repository, we would remove that user’s repository from that user’s AppProject |
| 284 | +
|
| 285 | +
|
| 286 | +## Current challenge with shared multitenancy: malicious Git repositories<a id="current-challenge-with-shared-multitenancy-malicious-git-repositories"></a> |
| 287 | +
|
| 288 | +Malicious Git repositories can be constructed, with two aims: denial of service or credential theft. |
| 289 | +
|
| 290 | +Denial of service: |
| 291 | +
|
| 292 | +- Create a GitOps repository which is several GB in size, requiring Argo CD to spend downstream bandwidth on transferring it |
| 293 | +
|
| 294 | +- Create a GitOps repository contains a large number of files, causing an OOM on repo-server Pod when it attempts to process them all (e.g. via Kustomize) |
| 295 | +
|
| 296 | +Credential theft/privileged information disclosure: |
| 297 | +
|
| 298 | +- Example: [CVE-2022-24731](https://github.com/argoproj/argo-cd/security/advisories/GHSA-h6h5-6fmq-rh28) |
| 299 | +
|
| 300 | +- Theoretical ability to trigger a malicious workload via kustomize/helm |
| 301 | +
|
| 302 | +
|
| 303 | +
|
| 304 | +Potential mitigations for these issues: |
| 305 | +
|
| 306 | +- My preference - run kustomize/helm/etc in disposable, fully untrusted K8s Job: perform all interactions with the Git repository, and invocations of kustomize/helm/etc, within a disposable, untrusted, K8s Job resource. |
| 307 | +
|
| 308 | + - Malicious workloads will be confined to within the Job (to the extent possible) |
| 309 | +
|
| 310 | + - Job is disposed of on job completion. |
| 311 | +
|
| 312 | + - No possibility for leakage of data between users (since all data is self-contained to the container and volume) |
| 313 | +
|
| 314 | + - If there is a concern with the need to do a git clone every time: You can also use a persistent volume per user (shared into the Job as a volume) |
| 315 | +
|
| 316 | +- Alternative solution: a repo server per user |
0 commit comments