Skip to content

Commit 2dcd77e

Browse files
committed
Migrate GitOps Repo Uniqueness doc to GH
1 parent fdf0a83 commit 2dcd77e

File tree

2 files changed

+152
-0
lines changed

2 files changed

+152
-0
lines changed
115 KB
Loading
Lines changed: 152 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,152 @@
1+
# Ensuring GitOps Repository URL Uniqueness
2+
3+
### Written by
4+
- Jonathan West (@jgwest)
5+
- Originally written April 25th, 2023
6+
7+
**TL; DR**: Before creating a new Application or RepositoryCredential row that references a Git repository, we ensure there doesn't exist another user that is already referencing that Git repository.
8+
9+
- How?
10+
11+
- We create a new database table, **RepositoryLockOwner**, that contains the specific Git repositories that are owned by each user
12+
13+
- We rely on a uniqueness constraint on the repository url field to enforce that
14+
15+
- This ensures global, inter-user uniqueness and atomicity (preventing race conditions)
16+
17+
- We return an error if a user attempts to create a RepositoryCredential or Application that targets a repo URL for which they do not own the corresponding URL in the RepositoryLockOwner row
18+
19+
- We also create new database tables, **RepositoryLockApplication** and **RepositoryLockRepositoryCredential**, to maintain references to that RepositoryLockOwner, so that we can GC the lock once it's no longer referenced.
20+
21+
- This is basically GC via reference counting
22+
23+
![](GitOps-Uniqueness-Diagram1.jpg)
24+
25+
26+
# Implementation Details
27+
28+
## New Database Tables
29+
30+
**RepositoryLockOwner** (or better name)
31+
32+
- _Description_: Maintains a unique list of which user owns which repository credential url. The uniqueness is guaranteed by a uniqueness constraint on the gitRepositoryURL field.
33+
34+
- Fields:
35+
36+
- **id** string (primary key, unique, generated on creation of RepositoryLockOwner)
37+
38+
- **gitrepositoryURL** string (unique, non-null)
39+
40+
- normalized: should be the same whether the repo url is SSH or GIT, etc.
41+
42+
- See Argo Cd for code that will normalize a Git repository URL (but, double check that their definition of normalization is the same as what we need)
43+
44+
- **clusteruser** string (foreign key to clusteruser table, non-null)
45+
46+
(**NOTE**: this table and its behaviour is basically very similar to another table, AppProjectRepository. For our purposes here, I'll keep them separate.)
47+
48+
**RepositoryLockApplication**
49+
50+
- _Description_: Maintain a list of which Applications reference which Git repository locks. This allows us to clean up RepositoryLockOwners, if there are no longer any applications/repositorycredentials that reference that lock.
51+
52+
- Fields:
53+
54+
- **lockID** string (foreign key to id field of RepositoryLockOwner, non-null)
55+
56+
- **applicationID** string (foreign key to application\_id field of Application, non-null)
57+
58+
**RepositoryLockRepositoryCredential**
59+
60+
- _Description_: Maintain a list of which Repository Credentials that reference which Git repository locks. This allows us to clean up RepositoryLockOwners, if there are no longer any applications/repositorycredentials that reference that lock.
61+
62+
- Fields:
63+
64+
- **lockID** string (foreign key to id field of RepositoryLockOwner, non-null)
65+
66+
- **repositorycredentialID** string (foreign key to repositorycredentials\_id field of Repository Credential, non-null)
67+
68+
69+
## New functions and behaviour
70+
71+
Create a new function **AcquireRepositoryCredentialURL(acquiringUser string, normalizedRepoURL string) (string, bool, string, error)**, with pseudocode:
72+
73+
- _Description: Acquire a repository lock on a particular Git repository, for a particular user_
74+
75+
- **return values**
76+
77+
- The primary key of the repository lock owner
78+
79+
- whether the user acquired the URL
80+
81+
- if the bool is false then the name of the other user that owns the repo url
82+
83+
- generic error return
84+
85+
- **Steps:**
86+
87+
- Sanity check that the user param is non-empty
88+
89+
- Sanity check the normalized repo url param is normalized (if possible)
90+
91+
* SELECT id, clusterUser on the RepositoryLockOwner database table,  WHERE gitRepositoryURL=normalizedGitRepoURL field
92+
93+
- We should ensure we index on this field
94+
95+
* If a match in the table already exists:
96+
97+
- return false, "a repository URL is already claimed by another user: "+clusterUser, nil
98+
99+
* Else:
100+
101+
- INSERT the acquiring user and normalizedRepo URL into the database
102+
103+
- On success, \`return id, true, "", nil\`
104+
105+
- On fail, due to failing uniqueness constraint: \`return “”, false, "a repository URL is already claimed by another user: "+clusterUser", nil\`
106+
107+
- This failure occur to the very rare chance that another user inserted between our SELECT and our INSERT calls.
108+
109+
- On other error, return \`””, false, "", err\`
110+
111+
     
112+
113+
When we are about to create or modify an Application, or a repository credential, call the above function to ensure we own the lock on it. If we don't own the lock on it, return an error, and don't allow the creation/modification of that application/repositorycredential.
114+
115+
**Next, whenever an Application or RepositoryCredential row is about to be created/modified, we should first do this:**
116+
117+
- Call AcquireRepositoryCredentialURL on the URL
118+
119+
- If it fails, report that back as an error, and exit.
120+
121+
- Don’t allow the Application/RepositoryCredential to be create/modifeid.
122+
123+
- For application, ensure there exists (create if not existing) a **RepositoryLockApplication** for that Application, pointing back to the **RepositoryLockOwner**
124+
125+
- For a repository credential, ensure there exists (create if not existing)  a **RepositoryLockRepositoryCredential** for that Application, pointing back to the **RepositoryLockOwner**.
126+
127+
**Garbage collection: whenever an Application or RepositoryCredential is deleted, we should do this:**
128+
129+
- Delete the corresponding RepositoryLockApplication and/or RepositoryLockRepositoryCredential for the repositoryLock
130+
131+
- countRemainingApplicationsUsingLock := Select count(\*) on RepositoryLockApplication where lockID = repositoryLockID
132+
133+
- countRemainingRepositoryCredentialsUsingLock := Select count(\*) on RepositoryLockRepository where lockID = repositoryLockID
134+
135+
- If countRemainingApplicationsUsingLock == 0 and countRemainingRepositoryCredentialsUsingLock == 0, then delete the **RepositoryLock**
136+
137+
138+
**Global garbage collection: finally, we should do periodic cleanup to make sure there aren’t any leftover repository locks**:
139+
140+
- Every X minutes/hours, we should:
141+
142+
- for each entry in repositorylockowner
143+
144+
- select count(\*) from repositorylockapplication where id=(id of repository lock owner)
145+
146+
- select count(\*) from repositorylockapplication where id=(id of repository lock owner)
147+
148+
- if no matches for either, delete repositorylockowner
149+
150+
- (if there are any matches created after the scan, then the delete will fail. This is how we avoid the race condition of a new entry that is added after the scan completes)
151+
152+
- Why? This allows us to catch any dangling repositorylockowners that we missed via the normal GC process, defined above.

0 commit comments

Comments
 (0)