|
| 1 | +# Ensuring GitOps Repository URL Uniqueness |
| 2 | + |
| 3 | +### Written by |
| 4 | +- Jonathan West (@jgwest) |
| 5 | +- Originally written April 25th, 2023 |
| 6 | + |
| 7 | +**TL; DR**: Before creating a new Application or RepositoryCredential row that references a Git repository, we ensure there doesn't exist another user that is already referencing that Git repository. |
| 8 | + |
| 9 | +- How? |
| 10 | + |
| 11 | +- We create a new database table, **RepositoryLockOwner**, that contains the specific Git repositories that are owned by each user |
| 12 | + |
| 13 | + - We rely on a uniqueness constraint on the repository url field to enforce that |
| 14 | + |
| 15 | + - This ensures global, inter-user uniqueness and atomicity (preventing race conditions) |
| 16 | + |
| 17 | +- We return an error if a user attempts to create a RepositoryCredential or Application that targets a repo URL for which they do not own the corresponding URL in the RepositoryLockOwner row |
| 18 | + |
| 19 | +- We also create new database tables, **RepositoryLockApplication** and **RepositoryLockRepositoryCredential**, to maintain references to that RepositoryLockOwner, so that we can GC the lock once it's no longer referenced. |
| 20 | + |
| 21 | + - This is basically GC via reference counting |
| 22 | + |
| 23 | + |
| 24 | + |
| 25 | + |
| 26 | +# Implementation Details |
| 27 | + |
| 28 | +## New Database Tables |
| 29 | + |
| 30 | +**RepositoryLockOwner** (or better name) |
| 31 | + |
| 32 | +- _Description_: Maintains a unique list of which user owns which repository credential url. The uniqueness is guaranteed by a uniqueness constraint on the gitRepositoryURL field. |
| 33 | + |
| 34 | +- Fields: |
| 35 | + |
| 36 | + - **id** string (primary key, unique, generated on creation of RepositoryLockOwner) |
| 37 | + |
| 38 | + - **gitrepositoryURL** string (unique, non-null) |
| 39 | + |
| 40 | + - normalized: should be the same whether the repo url is SSH or GIT, etc. |
| 41 | + |
| 42 | + - See Argo Cd for code that will normalize a Git repository URL (but, double check that their definition of normalization is the same as what we need) |
| 43 | + |
| 44 | + - **clusteruser** string (foreign key to clusteruser table, non-null) |
| 45 | + |
| 46 | +(**NOTE**: this table and its behaviour is basically very similar to another table, AppProjectRepository. For our purposes here, I'll keep them separate.) |
| 47 | + |
| 48 | +**RepositoryLockApplication** |
| 49 | + |
| 50 | +- _Description_: Maintain a list of which Applications reference which Git repository locks. This allows us to clean up RepositoryLockOwners, if there are no longer any applications/repositorycredentials that reference that lock. |
| 51 | + |
| 52 | +- Fields: |
| 53 | + |
| 54 | + - **lockID** string (foreign key to id field of RepositoryLockOwner, non-null) |
| 55 | + |
| 56 | + - **applicationID** string (foreign key to application\_id field of Application, non-null) |
| 57 | + |
| 58 | +**RepositoryLockRepositoryCredential** |
| 59 | + |
| 60 | +- _Description_: Maintain a list of which Repository Credentials that reference which Git repository locks. This allows us to clean up RepositoryLockOwners, if there are no longer any applications/repositorycredentials that reference that lock. |
| 61 | + |
| 62 | +- Fields: |
| 63 | + |
| 64 | + - **lockID** string (foreign key to id field of RepositoryLockOwner, non-null) |
| 65 | + |
| 66 | + - **repositorycredentialID** string (foreign key to repositorycredentials\_id field of Repository Credential, non-null) |
| 67 | + |
| 68 | + |
| 69 | +## New functions and behaviour |
| 70 | + |
| 71 | +Create a new function **AcquireRepositoryCredentialURL(acquiringUser string, normalizedRepoURL string) (string, bool, string, error)**, with pseudocode: |
| 72 | + |
| 73 | +- _Description: Acquire a repository lock on a particular Git repository, for a particular user_ |
| 74 | + |
| 75 | +- **return values**: |
| 76 | + |
| 77 | + - The primary key of the repository lock owner |
| 78 | + |
| 79 | + - whether the user acquired the URL |
| 80 | + |
| 81 | + - if the bool is false then the name of the other user that owns the repo url |
| 82 | + |
| 83 | + - generic error return |
| 84 | + |
| 85 | +- **Steps:** |
| 86 | + |
| 87 | + - Sanity check that the user param is non-empty |
| 88 | + |
| 89 | + - Sanity check the normalized repo url param is normalized (if possible) |
| 90 | + |
| 91 | +* SELECT id, clusterUser on the RepositoryLockOwner database table, WHERE gitRepositoryURL=normalizedGitRepoURL field |
| 92 | + |
| 93 | + - We should ensure we index on this field |
| 94 | + |
| 95 | +* If a match in the table already exists: |
| 96 | + |
| 97 | + - return false, "a repository URL is already claimed by another user: "+clusterUser, nil |
| 98 | + |
| 99 | +* Else: |
| 100 | + |
| 101 | + - INSERT the acquiring user and normalizedRepo URL into the database |
| 102 | + |
| 103 | + - On success, \`return id, true, "", nil\` |
| 104 | + |
| 105 | + - On fail, due to failing uniqueness constraint: \`return “”, false, "a repository URL is already claimed by another user: "+clusterUser", nil\` |
| 106 | + |
| 107 | + - This failure occur to the very rare chance that another user inserted between our SELECT and our INSERT calls. |
| 108 | + |
| 109 | + - On other error, return \`””, false, "", err\` |
| 110 | + |
| 111 | + |
| 112 | + |
| 113 | +When we are about to create or modify an Application, or a repository credential, call the above function to ensure we own the lock on it. If we don't own the lock on it, return an error, and don't allow the creation/modification of that application/repositorycredential. |
| 114 | + |
| 115 | +**Next, whenever an Application or RepositoryCredential row is about to be created/modified, we should first do this:** |
| 116 | + |
| 117 | +- Call AcquireRepositoryCredentialURL on the URL |
| 118 | + |
| 119 | + - If it fails, report that back as an error, and exit. |
| 120 | + |
| 121 | + - Don’t allow the Application/RepositoryCredential to be create/modifeid. |
| 122 | + |
| 123 | +- For application, ensure there exists (create if not existing) a **RepositoryLockApplication** for that Application, pointing back to the **RepositoryLockOwner** |
| 124 | + |
| 125 | +- For a repository credential, ensure there exists (create if not existing) a **RepositoryLockRepositoryCredential** for that Application, pointing back to the **RepositoryLockOwner**. |
| 126 | + |
| 127 | +**Garbage collection: whenever an Application or RepositoryCredential is deleted, we should do this:** |
| 128 | + |
| 129 | +- Delete the corresponding RepositoryLockApplication and/or RepositoryLockRepositoryCredential for the repositoryLock |
| 130 | + |
| 131 | +- countRemainingApplicationsUsingLock := Select count(\*) on RepositoryLockApplication where lockID = repositoryLockID |
| 132 | + |
| 133 | +- countRemainingRepositoryCredentialsUsingLock := Select count(\*) on RepositoryLockRepository where lockID = repositoryLockID |
| 134 | + |
| 135 | +- If countRemainingApplicationsUsingLock == 0 and countRemainingRepositoryCredentialsUsingLock == 0, then delete the **RepositoryLock** |
| 136 | + |
| 137 | + |
| 138 | +**Global garbage collection: finally, we should do periodic cleanup to make sure there aren’t any leftover repository locks**: |
| 139 | + |
| 140 | +- Every X minutes/hours, we should: |
| 141 | + |
| 142 | + - for each entry in repositorylockowner |
| 143 | + |
| 144 | + - select count(\*) from repositorylockapplication where id=(id of repository lock owner) |
| 145 | + |
| 146 | + - select count(\*) from repositorylockapplication where id=(id of repository lock owner) |
| 147 | + |
| 148 | + - if no matches for either, delete repositorylockowner |
| 149 | + |
| 150 | + - (if there are any matches created after the scan, then the delete will fail. This is how we avoid the race condition of a new entry that is added after the scan completes) |
| 151 | + |
| 152 | + - Why? This allows us to catch any dangling repositorylockowners that we missed via the normal GC process, defined above. |
0 commit comments