|
| 1 | +# Introduction to GitOps Service Code/Architecture |
| 2 | + |
| 3 | +## Written by |
| 4 | +- Jonathan West (@jgwest) |
| 5 | +- Originally written November 4th, 2022 |
| 6 | + |
| 7 | + |
| 8 | +## The high level picture of the GitOps Service |
| 9 | + |
| 10 | +GitOps Service API is similar to Argo CD Application API: |
| 11 | +[https://github.com/redhat-appstudio/managed-gitops/blob/main/docs/api.md\#gitopsdeployment](https://github.com/redhat-appstudio/managed-gitops/blob/main/docs/api.md#gitopsdeployment) |
| 12 | + |
| 13 | +Ultimately the GitOps Service will: |
| 14 | + |
| 15 | +* Watch for the GitOps Service API on user KCP workspace. |
| 16 | +* Process the event, and configure Argo CD to |
| 17 | + |
| 18 | +For example: |
| 19 | + |
| 20 | +* User creates a **GitOpsDeployment** |
| 21 | + * GitOps Service: Create an Argo CD Application based on that, on the target Argo CD cluster |
| 22 | +* User creates a **ManagedEnvironment** |
| 23 | + * GitOps Service: Create an Argo CD cluster secret based on that, on the target Argo CD cluster |
| 24 | +* User creates a **RepositoryCredential** |
| 25 | + * GitOps Service: Create an Argo CD repository credential based on that, on the target Argo CD cluster |
| 26 | + |
| 27 | +You can see how the GitOps Service API directly translates into the corresponding Argo CD API. |
| 28 | + |
| 29 | +The advantage to the GitOps Service API is that: |
| 30 | + |
| 31 | +* It is designed (better suited) for to deploying to KCP |
| 32 | +* We can modify it without getting the consent of upstream |
| 33 | +* We can use ‘better’ API design principles (CRs, rather than configmaps) |
| 34 | +* My thoughts are [here](../argo-cd-api-suitability-for-kcp-based-service.md) |
| 35 | + |
| 36 | +## Which APIs are the Core GitOps API? |
| 37 | + |
| 38 | +The Core GitOps APIs all begin with ‘*GitOpsDeployment\*’*. They are all under the ‘managed-gitops.redhat.com’ API group, and are defined in the backend-shared component. |
| 39 | + |
| 40 | +All of the Core APIs which have a corresponding Argo CD equivalent: |
| 41 | + |
| 42 | +* **GitOpsDeployment** |
| 43 | + * Equivalent to Argo CD **Application**, and with a very similar API. |
| 44 | +* **GitOpsDeploymentRepositoryCredential** |
| 45 | + * Equivalent to an Argo CD [repository secret](https://argo-cd.readthedocs.io/en/stable/operator-manual/declarative-setup/#repository-credentials) |
| 46 | +* **GitOpsDeploymentManagedEnvironment** |
| 47 | + * Equivalent to an Argo CD [cluster secret](https://argo-cd.readthedocs.io/en/stable/operator-manual/declarative-setup/#clusters) |
| 48 | +* **GitOpsDeploymentSyncRun** |
| 49 | + * Equivalent to running ‘argocd app sync (application name)’ on an Application |
| 50 | + |
| 51 | +Internal Core APIs: |
| 52 | + |
| 53 | +* These APIs are not exposed to the user. They are only used internally within the GitOps Service, to communicate between components. |
| 54 | +* **Operation** |
| 55 | + * Used by the ‘backend’ component to notify the ‘cluster- agent’ of changes. |
| 56 | + |
| 57 | +Examples for all [APIs are here](https://github.com/redhat-appstudio/managed-gitops/blob/main/docs/api.md). |
| 58 | + |
| 59 | +## Which are the AppStudio GitOps APIs? |
| 60 | + |
| 61 | +These AppStudio GitOps API is built on top of the core APIs, to implement the AppStudio/HACBS application development model. |
| 62 | + |
| 63 | +* The AppStudio GitOps controller creates and uses the Core GitOps APIs to implement the AppStudio Application/Component/Environment model. |
| 64 | +* For example: when a Environment is created, the Environment controller in appstudio-controller creates and configures a corresponding GitOpsDeploymentManaged |
| 65 | + |
| 66 | +The AppStudio GitOps APIs are: |
| 67 | + |
| 68 | +* **Environment** |
| 69 | + * Defines a target deployment environment, such ‘dev’ or ‘staging’ |
| 70 | + * Environments correspond to KCP sub-workspaces, or remote clusters |
| 71 | +* **SnapshotEnvironmentBinding** |
| 72 | + * Defines which Applications should be deployed to which environments, and what version of the application to deploy. |
| 73 | +* **Snapshot** |
| 74 | + * Defines a set of container images that make up a particular version of an application |
| 75 | +* **PromotionRun** |
| 76 | + * Used to promote versions of applications (snapshots) between environments. |
| 77 | + |
| 78 | +They are defined in the [application-api repository](https://github.com/redhat-appstudio/application-api/), and have the API group of ‘appstudio.redhat.com’. |
| 79 | + |
| 80 | +Examples for all [APIs are here](https://github.com/redhat-appstudio/managed-gitops/blob/main/docs/api.md). |
| 81 | + |
| 82 | +## Architecture Diagram |
| 83 | + |
| 84 | + |
| 85 | + |
| 86 | +## What are the components of the GitOps Service? |
| 87 | + |
| 88 | +We use a [Git mono-repo](https://github.com/redhat-appstudio/managed-gitops) to hold all of the components of the GitOps Service. Why? This allows us to make changes to all components across a single PR, and test those changes within that single PR. |
| 89 | + |
| 90 | +**Core GitOps Service**: |
| 91 | + |
| 92 | +* **backend** |
| 93 | + * Watches KCP workspaces (via virtual workspaces) for API requests on the Core GitOps APIs, and updates the RDBMS database to reflect the user’s desired state |
| 94 | +* **cluster-agent** |
| 95 | + * Singularly responsible for configuring/interfacing with Argo CD: ensures that Argo CD is up-to-date with desired state described in the database. |
| 96 | +* **backend-shared** |
| 97 | + * Code that is shared between the two components, ‘backend’ and ‘cluster-agent’ |
| 98 | + |
| 99 | +**AppStudio GitOps Service**: |
| 100 | + |
| 101 | +* **appstudio-controller** |
| 102 | + * Watches KCP workspaces (via virtual workspaces) for AppStudio API requests on the AppStudio GItOps APIs, and updates Core GitOps APIs |
| 103 | +* **appstudio-shared** |
| 104 | + * Where the AppStudio Environment APIs are defined |
| 105 | + * BUT, this has now moved to [redhat-appstudio/application-api](https://github.com/redhat-appstudio/application-api) |
| 106 | + |
| 107 | +**Tests and utilities:** |
| 108 | + |
| 109 | +* **tests-e2e**: Where the E2E tests live for both the Core and AppStudio GitOps APIs |
| 110 | +* **utilities**: code for our database migration CI checks |
| 111 | + |
| 112 | +## Differences between the GitOps Service controllers, and OpenShift GitOps Controller |
| 113 | + |
| 114 | +**Must scale on public cloud, to a large number of (KCP) users** |
| 115 | + |
| 116 | +The OpenShift GitOps controllers are scaled to the size of a single cluster: at most several hundred people might have access to it. |
| 117 | + |
| 118 | +In contrast, the GitOps Service must scale to at least several thousand people, and potentially many more |
| 119 | + |
| 120 | +* For example, the Red Hat Dev Sandbox has 3,055 active users, as of the last time I checked. |
| 121 | + |
| 122 | +**Multithreaded** |
| 123 | + |
| 124 | +The GitOps Service needs to support many simultaneous users across many different virtual clusters (KCP workspaces). This necessarily requires us to support many simultaneous user API requests across many different threads (goroutines). |
| 125 | + |
| 126 | +**Guarding against malicious users / free-tier users with no identity verification:** |
| 127 | + |
| 128 | +The GitOps Service needs to guard against malicious users. Theoretically, so does the OpenShift GitOps Controller, BUT, the GitOps Service must support fully unverified, pseudonymous users, such as those using the AppStudio/HACBS free compute tier. |
| 129 | + |
| 130 | +* This is similar to how the Red Hat OpenShift Dev Sandbox supports pseudonymous users. |
| 131 | +* The Dev Sandbox team is constantly fighting against cryptominers 🙃 |
| 132 | + |
| 133 | +In contrast, most users of OpenShift GitOps would be employees of an organization, and thus are not anonymous and have a financial incentive not to become malicious (e.g. in order to keep their jobs). |
| 134 | + |
| 135 | +**Sharding of K8s requests per workspace/namespace:** |
| 136 | + |
| 137 | +Supporting multiple simultaneous users, and guarding against malicious users, requires us to have a system of sharding for incoming requests. |
| 138 | + |
| 139 | +At present, requests are sharded per namespace of a KCP workspace. |
| 140 | + |
| 141 | +* Each namespace, within each user’s KCP workspace, is handled by a different goroutine. |
| 142 | + |
| 143 | +**Relational Database (postgresql):** |
| 144 | + |
| 145 | +In order to ensure scalability, data integrity, and reliability of the GitOps Service, an RDBMS (PostgreSQL) is used. |
| 146 | + |
| 147 | +Reasons to store in an RDBMS, rather than as Kubernetes objects, include: |
| 148 | + |
| 149 | +1. Once you have a lot of users, it's easier to scale up a managed RDBMS (such as Amazon RDS), than to scale up the etcd of an in-use cluster where your control plane lives |
| 150 | +2. Transactions, foreign keys, and strong typing: ensuring that the data for your service is always valid, consistent, and always moves between known good states. |
| 151 | +3. Arbitrary queries: ability to efficiently query the database (after adding the appropriate indices) across any set of fields. |
| 152 | +4. Avoid K8s resources limits (e.g. avoid maximum size resource limit of 1.5MB, IIRC) |
| 153 | +5. More generally: you can easily store 100k+ records in a database table, can you easily store 100k+ resources in a K8s namespace? |
| 154 | +6. From an architectural level, it means we don't need to share k8s cluster credentials of parent gitops services instance with child cluster-agent clusters, and vice versa. (we only need to share database credentials). |
| 155 | + |
| 156 | +(You can also plug other stuff from the Postgresql/RDBMS ecosystem in, like streaming events via Debezium). |
| 157 | + |
| 158 | +Will the persistence storage layer be the performance bottleneck of the GitOps Service? That's hard to predict, but IMHO an RDBMS is a more solid foundation for scaling a large amount of data than a K8s control plane. Many many many companies around the world are scaling using RDBMSes; not many are storing non-k8s application data in a K8s control plane. |
| 159 | + |
| 160 | +**No Web-based User Interface (UI)** |
| 161 | + |
| 162 | +The GitOps Service does not have its own UI. We instead rely on consuming services, such as AppStudio, to build a UI on top. |
| 163 | + |
| 164 | +## Relational Database |
| 165 | + |
| 166 | +Schema can be found in ‘db-schema.sql’ at the root of the project. |
| 167 | + |
| 168 | +We use ‘go-migration’ to handle migration of one database version to another. The migrations are defined in ‘utilities/db-migration/migrations’. |
| 169 | + |
| 170 | +We use GitHub actions to ensure that the migrations are in sync with the db-schema.sql contents. |
| 171 | + |
| 172 | +Tables in database: |
| 173 | + |
| 174 | + |
| 175 | +## Resource synchronization with operations |
| 176 | + |
| 177 | +Ultimately, the goal of the GitOps Service is to create/modify/delete Argo CD Applications, and cluster/repository secrets. |
| 178 | + |
| 179 | +In the GitOps Service, this functionality is split between two components: ‘backend’ and ‘cluster-agent’ |
| 180 | + |
| 181 | +The high-level interaction looks like this: |
| 182 | + |
| 183 | +1) User creates/modifies deletes a GitOps Service API resource (for example, GitOpsDeployment) |
| 184 | +2) Event received by backend |
| 185 | +3) Backend updates database, and notifies cluster-agent of the DB update |
| 186 | +4) Cluster-agent sees the DB update, and updates Argo CD |
| 187 | + * Creates an Argo CD Application, cluster secret, or repository secret. |
| 188 | + |
| 189 | +KCP workspace \-\> backend \-\> db \-\> cluster agent \-\> argo cd |
| 190 | + |
| 191 | +The same set of steps, with a bit more information. |
| 192 | + |
| 193 | +1) A GitOpsDeployment is created in a user’s KCP workspace. |
| 194 | +2) The GitOps Service ‘backend’ controller receives the creation event from K8s. |
| 195 | +3) The GitOps Service updates the RDBMS based on the contents of the GitOpsDeployment |
| 196 | + 1) A row is added to the ‘Application’ table, with the contents of the GitOpsDeployment from step 1\. |
| 197 | +4) An Operation K8s resource is created, to inform the ‘cluster-agent’ component of the database update. |
| 198 | +5) The cluster-agent components sees the Operation, and reads the corresponding database entry |
| 199 | + 1) The cluster agent |
| 200 | +6) The cluster-agent creates an Argo CD Application based on contents of the Application row of the database from step 3\. |
| 201 | + |
| 202 | +See also this [detailed diagram of step by step how this works](../presentations/gitops-service-GitOpsDeployment-creatio-steps.odp). |
| 203 | + |
| 204 | +More details on operations in the Internal Architecture document, linked above. |
| 205 | + |
| 206 | +## Multithreading |
| 207 | + |
| 208 | +Much of the complexity of the ‘backend’ component of the codebase deals with the hard challenge of how to support many different users simultaneously, across many different goroutines, without hitting deadlocks, and race conditions. |
| 209 | + |
| 210 | +Most controller-runtime based controllers are simple: they avoid this issue by only supporting a single active Reconcile() at a time (for example, the OpenShift GitOps controller, or the Argo CD ApplicationSet controller.). |
| 211 | + |
| 212 | +* They are able to get away with only a single active Reconcile, because the custom resources (CRs) that they reconcile change infrequently. |
| 213 | + |
| 214 | +OTOH, Argo CD is an example of a controller that supports multiple simultaneous user requests, and it does this by having a large number (\~25) mutexes sprinkled throughout the code. I speak from experience when I say this makes reasoning about controller concurrency challenging, at times. |
| 215 | + |
| 216 | +But rather than using mutexes, we use channels: we follow the Go best practice of [sharing data via message passing (channels), not via locks on shared memory (mutexes)](https://go.dev/blog/codelab-share) |
| 217 | + |
| 218 | +## Multithreading: ‘Event Loop’ Pattern |
| 219 | + |
| 220 | +For this we use ‘event loops’ (also known as [actors](https://en.wikipedia.org/wiki/Actor_model), or perhaps ‘in-process’ microservices), which listen for messages on go channels from other parts of the program. |
| 221 | + |
| 222 | +* This is similar to how Argo CD is partitioned into multiple separate microservices (repo-server, application-controller, etc), but, the GitOps Services are much smaller and |
| 223 | + |
| 224 | +This makes writing multithreaded code less painful, because: |
| 225 | + |
| 226 | +* Messages passed between event loops are immutable. |
| 227 | +* Data within an event loop is not shared outside of that event loop. |
| 228 | +* A single goroutine is responsible for processing messages on each event loop. |
| 229 | + |
| 230 | +These factors mean that mutexes (or other locks) are not needed. |
| 231 | + |
| 232 | +Event Loops: |
| 233 | + |
| 234 | +* **Controller** |
| 235 | + * Reconcile function of controller is called on any create/modification/delete/watch events for an API resource. |
| 236 | +* **Preprocess Event Loop** |
| 237 | + * Receive events from controller, pass them to controller event loop. |
| 238 | + * Responsible for processing the events before passing them to the next event loop. |
| 239 | +* **Controller Event Loop** |
| 240 | + * Receive events from preprocess event loop |
| 241 | + * Starts a new instance of a workspace event loop for every namespace of a workspace. |
| 242 | + * Pass events to that workspace event loop. |
| 243 | +* **Workspace Event Loop:** |
| 244 | + * Receives event from controller event loop |
| 245 | + * Starts a new instance of application event loop for every GitOpsDeployment name/namespace combination |
| 246 | + * Passes events to that application event loop |
| 247 | +* **Application Event Loop**: Receive events for GitOpsDeployments/GitOpsDeploymentSyncRuns |
| 248 | + * Receives events from Workspace Event Loop |
| 249 | + * Calls application\_event\_runner\_(deployment/syncrun) with the event/ |
| 250 | + |
| 251 | +* **Workspace Resource Event Loop** |
| 252 | + * workspaceResourceEventLoop is responsible for handling events for API-namespaced-scoped resources, like events for RepositoryCredentials resources. |
| 253 | + * |
| 254 | +* **Shared Resource Event Loop** |
| 255 | + * The goal of the shared resource event loop is to ensure that API-namespace-scoped resources are only created from a single thread, preventing concurrent goroutines from stepping on each other's toes. |
| 256 | + |
| 257 | + |
| 258 | + |
| 259 | +So you may be asking again, why all these event loops? Well this allows us to ensure that we can handle requests from a large number of users in a scalable manner, without data races or race conditions, and with protection from malicious users. |
| 260 | + |
| 261 | +## But: you mostly don’t have to deal with the logic of most of these event loops |
| 262 | + |
| 263 | +The vast majority of work is done in application event runner, and in the shared resource loop. |
| 264 | + |
| 265 | +So for the most part it’s only necessary to know where particular resources are handled. It’s often enough just to know this: |
| 266 | + |
| 267 | +* **GitOpsDeployment**: handled in ‘application\_event\_runner\_deployments.go’ |
| 268 | +* **GitOpsDeploymentSyncRun**: handled in ‘application\_event\_runner\_syncruns.go’ |
| 269 | +* **GitOpsDeploymentRepositoryCredentials**: handed in shared resource loop |
| 270 | +* **GitOpsDeploymentManagedEnvironment**: handled in ‘sharedresourceloop\_managedenv.go’ |
| 271 | + |
| 272 | +## More details and advanced topics |
| 273 | + |
| 274 | +For more details on what was discussed today, and some more advanced topics, check out the [AppStudio GitOps Internal Architecture document](../gitops-service-internal-architecture-appstudio/internal-architecture.md), in our GitOps Service Google Drive folder. |
0 commit comments