|
| 1 | +# Private repository support for a fixed set of private, GitHub organizations |
| 2 | + |
| 3 | +### Written by |
| 4 | +- Jonathan West (@jgwest) |
| 5 | +- Originally written in September 26, 2023 |
| 6 | + |
| 7 | +At present, within AppStudio, all users' GitOps repositories are created within the [https://github.com/redhat-appstudio-appdata](https://github.com/redhat-appstudio-appdata) organization. The Application Service (HAS) component of AppStudio has the credentials for this repo, and uses the GitHub REST API to create/delete these repositories, and the Git API to push to them. |
| 8 | + |
| 9 | +As of this writing, those 'redhat-appstudio-appdata' repositories are all public, but this is primarily because GitOps Service did not support private Git repositories in the early stages of the AppStudio project (and we have not been asked to add AppStudio private repository support since). |
| 10 | + |
| 11 | +As part of [RHTAP-1023](https://issues.redhat.com/browse/RHTAP-1023), however, Git repositories may now be private, in order to support embargoed content. We in GitOps Service thus need to ensure that GitOps Service configures Argo CD to pull from Appstudio-managed private Git repositories. |
| 12 | + |
| 13 | +Unlike [GITOPSRVCE-28](https://issues.redhat.com/browse/GITOPSRVCE-28), which allows users to provide credentials for their own private repos, instead, this Epic is limited to provide a single, global pool of tokens to be used for organization-managed GitOps Repos, such as [https://github.com/redhat-appstudio-appdata](https://github.com/redhat-appstudio-appdata) |
| 14 | + |
| 15 | +* AFAIK this org is the ONLY org we need to provide private repository support for, at this time. |
| 16 | + |
| 17 | +This point means this epic is much more limited in scope versus the more open, user-focused, GITOPSRVCE-28. |
| 18 | + |
| 19 | +# Out of Scope |
| 20 | + |
| 21 | +As above, with this epic, there are no changes to the following behaviours of AppStudio: |
| 22 | + |
| 23 | +* Users will not be able to provide their own private repository credentials. |
| 24 | +* Users will not be able to provide their own GitOps repository URL |
| 25 | +* Users cannot customize their GitOps repository (beyond the ability to provide a custom devfile) |
| 26 | + |
| 27 | +# Proposed Workflow |
| 28 | + |
| 29 | +**1\) AppStudio maintains a list of GitHub API tokens (personal access tokens, PATs), either shared, or per team** |
| 30 | + |
| 31 | +* Ideally we would be able to share HAS’ token pool, which would obviate the need for GitOps Service to maintain their own token pool. |
| 32 | + * BUT, this requires shared consensus between the teams. |
| 33 | +* The actual shared list of tokens is stored in app-sre’s Hashicorp vault instance |
| 34 | +* An [External Secrets resource](https://github.com/redhat-appstudio/infra-deployments/blob/main/components/has/base/external-secrets/has-github-token.yaml) reads the secret from Hashicorp vault, and writes it to a Secret in 'gitops' Namespace. |
| 35 | + * Hashicorp \-\> External Secrets is the standard AppStudio mechanism for this |
| 36 | +* See below for the format of the Secret (based on HAS' team format) |
| 37 | + |
| 38 | +**2\) The 'cluster-agent' component of GitOps Service should be the only component that we need to provide access to this token pool** |
| 39 | + |
| 40 | +* Add env var(s) to cluster-agent, referencing the token list Secret in the Namespace. |
| 41 | +* That would look like this: |
| 42 | + |
| 43 | +```yaml |
| 44 | +# cluster-agent's Deployment |
| 45 | +apiVersion: apps/v1 |
| 46 | +kind: Deployment |
| 47 | +metadata: |
| 48 | + name: controller-manager |
| 49 | +spec: |
| 50 | + template: |
| 51 | + spec: |
| 52 | + containers: |
| 53 | + - command: |
| 54 | + - gitops-service-cluster-agent |
| 55 | + # (...) |
| 56 | + name: manager |
| 57 | + env: |
| 58 | + # A group of Secrets for each org containing private repos: But, I currently expect we'll need only one, for 'http://github.com/redhat-appstudio-appdata' |
| 59 | + # TOKEN_POOL_1_* |
| 60 | + - name: TOKEN_POOL_1_ORG_URL |
| 61 | + value: "http://github.com/redhat-appstudio-appdata" |
| 62 | + - name: TOKEN_POOL_1_SECRET |
| 63 | + value: "token-pool-tokens" # reference to Secret in Namespace |
| 64 | + |
| 65 | + # (...) |
| 66 | + # TOKEN_POOL_N_* |
| 67 | + - name: TOKEN_POOL_N_ORG_URL |
| 68 | + value: "(...)" |
| 69 | + - name: TOKEN_POOL_1_SECRET |
| 70 | + value: "(...)" |
| 71 | + |
| 72 | +--- |
| 73 | + |
| 74 | +# Token Pool Secret (actual contents coming from Hashicorp vault via External Secrets) |
| 75 | + |
| 76 | +kind: Secret |
| 77 | +metadata: |
| 78 | + name: token-pool-tokens |
| 79 | +data: |
| 80 | + # Secret format from HAS |
| 81 | + tokens: "token1:(...),token2:(...),tokenN:(...),token7:(...)" |
| 82 | +``` |
| 83 | +
|
| 84 | +**3\) In cluster-agent, whenever cluster-agent creates/modifies an Argo CD Application CR (via an Application Operation), AND the '.spec.source' field of the Argo CD Application CR matches one of the TOKEN\_POOL\_X\_ORG\_URLs defined in the environment variables, we should do the following:** |
| 85 | +
|
| 86 | +Create (ensure there exists) a [Repository Credential Secret](https://argo-cd.readthedocs.io/en/stable/operator-manual/declarative-setup/#repository-credentials) for that repository: |
| 87 | +
|
| 88 | +```yaml |
| 89 | +apiVersion: v1 |
| 90 | +kind: Secret |
| 91 | +metadata: |
| 92 | + name: "repo-cred-(sha-256 hash of repo url)" |
| 93 | + namespace: gitops-service-argocd |
| 94 | + labels: |
| 95 | + argocd.argoproj.io/secret-type: repo-creds |
| 96 | + |
| 97 | +stringData: |
| 98 | + type: git |
| 99 | + url: "http://github.com/redhat-appstudio-appdata/(repo URL value from .spec.source field of Argo CD Application)" |
| 100 | + password: "(token from token pool, chosen using below algorithm)" |
| 101 | + username: username |
| 102 | +``` |
| 103 | +
|
| 104 | +**4\) What value should we use for "(token from token pool)", in the previous step? Well, we can use the following algorithm to determine which token to use** |
| 105 | +
|
| 106 | +Pseudocode: |
| 107 | +
|
| 108 | +```go |
| 109 | + |
| 110 | +githubTokenListFromSecret := { /* read from has-github-token Secret */ } |
| 111 | + |
| 112 | +// hash the URL |
| 113 | +hashedValue := sha256.Sum256(gitRepositoryURL) |
| 114 | + |
| 115 | +// use the first byte of the hashed value to index into token list |
| 116 | +secretIndex := hashedValue[0] % len(githubTokenListFromSecret) |
| 117 | + |
| 118 | +repositoryTokenValToUse := githubTokenListFromSecret[secretIndex] |
| 119 | +``` |
| 120 | + |
| 121 | +**TL;DR**: hash the git URL and use that to index into the token list, to ensure an even distribution between tokens. |
| 122 | + |
| 123 | +# Alternatives Considered |
| 124 | + |
| 125 | +**Why not just define a GitOpsRepositoryCredential CR in each Namespace, containing the credentials for the repo URL?** |
| 126 | + |
| 127 | +* GitOpsDeploymentRepositoryCredential works great for cases where users have their own private Git Repository, and their own private credentials |
| 128 | +* However, in this case, the user does not have the credentials for the private GitOps repository (these are only known by Red Hat) |
| 129 | +* With GitOpsDeploymentRepositoryCredential, the token is stored in a Secret in the user’s ‘(username)-tenant’ namespace |
| 130 | +* In AppStudio, users can view Secrets in their own Namespace |
| 131 | +* Thus, the GitHub token PAT that we use to communicate with the repo would necessarily be viewable to the user, with this approach |
| 132 | +* Thus, the only way this would work would be if we generated a PAT token PER USER, which would be excessive |
| 133 | + |
| 134 | +**Rather than defining an Argo CD Repository Secret for each repository, why not define a single Argo CD Repository Secret to be shared by all the repos?** |
| 135 | + |
| 136 | +* Since we have a large number of users on AppStudio, we want to ensure that we do not overuse a single PAT token for all our Git requests, but rather we distribute that work over multiple accounts (tokens). |
| 137 | + * On multi-tenant prod, 341 Argo CD Applications (and roughly the same number of Git repos) |
| 138 | +* This allows us to evenly distribute (via SHA-256 hashes indexing into token lists) the work across all available tokens |
| 139 | + |
0 commit comments