Skip to content
Merged
Changes from 2 commits
Commits
Show all changes
108 commits
Select commit Hold shift + click to select a range
b6f5ae5
wip
maltesander Apr 11, 2025
3e983a0
argocd / airflow stack working
maltesander Apr 11, 2025
d64a417
added spark op
maltesander Apr 11, 2025
64daa28
wip
maltesander Apr 11, 2025
15612f7
fix sealed secret location
maltesander Apr 11, 2025
e81624d
fix demo branches
maltesander Apr 11, 2025
822dee5
fixes
maltesander Apr 11, 2025
4bf2626
add role and binding for airflow / spark
maltesander Apr 11, 2025
b1e33bc
remove ns
maltesander Apr 11, 2025
f8ab043
test minio
maltesander Apr 11, 2025
3d8e664
fix sync policy
maltesander Apr 11, 2025
150fd76
testing
maltesander Apr 11, 2025
f05edcd
add airflow logs minio
maltesander Apr 12, 2025
3cedcfa
fixes
maltesander Apr 12, 2025
871d459
extend cert expiry to 10 years
maltesander Apr 13, 2025
e486d9c
split stack & demo
maltesander Apr 13, 2025
46f48b4
install all operators
maltesander Apr 13, 2025
5dca1f8
fixes
maltesander Apr 13, 2025
95f9e5a
use sealed secrets for minio / postgres
maltesander Apr 13, 2025
74b893c
add zookeeper
maltesander Apr 13, 2025
57f70cc
fix path
maltesander Apr 13, 2025
cbdb400
fix path 2
maltesander Apr 13, 2025
0b43683
fix secret name
maltesander Apr 13, 2025
3fa0691
fix credentials
maltesander Apr 13, 2025
d2b82f7
attempt to fix secret
maltesander Apr 13, 2025
a7a95fc
seal minio connection
maltesander Apr 13, 2025
450775e
fix secret
maltesander Apr 13, 2025
9319fcf
try fix postgres secret
maltesander Apr 13, 2025
84ee9a5
fix env override
maltesander Apr 13, 2025
4846f00
fix overrides
maltesander Apr 13, 2025
8f7d766
fix container name
maltesander Apr 13, 2025
1c8b5a5
fix overrides
maltesander Apr 13, 2025
4fcc2a1
enable gitsync
maltesander Apr 13, 2025
f52cb08
fix git sync
maltesander Apr 13, 2025
32011e8
move yaml out of dags git sync
maltesander Apr 13, 2025
2efbab2
set resources
maltesander Apr 13, 2025
945cbf2
linter
maltesander Apr 13, 2025
5e3498b
install all operators via argo
maltesander Apr 14, 2025
0c30a99
improve comments and labels
maltesander Apr 14, 2025
33a375e
remove airflowdb from clusterrole
maltesander Apr 24, 2025
a6596b3
use 25.3 release
maltesander May 19, 2025
5b428e1
Merge branch 'main' into spike/argocd-demo
maltesander Jul 21, 2025
bd878aa
Merge branch 'main' into spike/argocd-demo
maltesander Jul 21, 2025
41c9cb5
bump airflow version, adapt listenerclass, parameterize git sync repo
maltesander Jul 21, 2025
bb9e227
revert templating - manged by argo
maltesander Jul 21, 2025
83e4ca0
adapt sealed secrets version
maltesander Jul 21, 2025
304ec52
customize repo, add opensearch as comment
maltesander Jul 21, 2025
0057f8e
template repo urls
maltesander Jul 21, 2025
8a52bd9
add parameters, improve descrition
maltesander Jul 21, 2025
e0e9b0a
bump argocd helm chart v8.1.4
maltesander Jul 21, 2025
be5d6e6
switch to 0.0.0-dev operators
maltesander Jul 21, 2025
3509993
attempt to fix dags for airflow 3
maltesander Jul 21, 2025
f364191
fix scope
maltesander Jul 21, 2025
d9eb2c7
attempt to fix dag
maltesander Jul 21, 2025
24ff3e8
change path in correct airflow file...
maltesander Jul 21, 2025
0b9761f
use airflow 2.10.5
maltesander Jul 22, 2025
59c25dc
deploy sealed secrets before operators
maltesander Jul 22, 2025
75e21f0
Merge branch 'main' into spike/argocd-demo
maltesander Jul 22, 2025
6b538b1
change demo name to argo-cd-git-ops
maltesander Jul 23, 2025
68778e1
parameterize sealed secrets repo / target revision
maltesander Jul 23, 2025
f63e6a1
wip - docs
maltesander Jul 23, 2025
a3ad536
Merge branch 'main' into spike/argocd-demo
maltesander Jul 23, 2025
7d747c2
small fixes
maltesander Jul 23, 2025
957d7e4
revert sealed secret paramterization - demo parameters not picked up …
maltesander Jul 23, 2025
c4ae615
docs fixes
maltesander Jul 23, 2025
01ba8c1
doc fixes 2
maltesander Jul 23, 2025
4428e48
doc fixes 3
maltesander Jul 23, 2025
fcd3bf7
fix overview parts
maltesander Jul 24, 2025
6225fef
Merge remote-tracking branch 'origin/main' into spike/argocd-demo
maltesander Jul 25, 2025
eb78876
change namespace to sealed-secrets
maltesander Jul 28, 2025
a9789be
increase webserver memory 1gb to 1.5gb
maltesander Jul 28, 2025
d76d526
test 3.0.1 and remote logging
maltesander Jul 28, 2025
8fb3bff
improve git interaction docs
maltesander Jul 29, 2025
3530c10
downgrade airflow to 2.10.5 - remote logging not working for 3
maltesander Jul 29, 2025
47a885a
add architecture overview svg to docs
maltesander Jul 29, 2025
0bcade9
wip - docs
maltesander Jul 29, 2025
1ef3516
add readme for sealing secrets
maltesander Jul 30, 2025
e3e6727
improve seal secret docs
maltesander Jul 30, 2025
7c6da94
add kubeseal offline guide
maltesander Jul 30, 2025
cb58dfa
increase webserver memory limit to 2gb
maltesander Jul 30, 2025
05f4345
remove pyspark dag & spark references
maltesander Jul 30, 2025
4b2a5ac
improve arch overview
maltesander Jul 30, 2025
ace9461
fix diagram
maltesander Jul 30, 2025
a432518
copy&paste fixes overview
maltesander Jul 31, 2025
20ba3a2
fix arrows
maltesander Jul 31, 2025
91ff3dd
remove autoformatting
maltesander Jul 31, 2025
ccb712c
Apply suggestions from code review
maltesander Jul 31, 2025
fb74c68
make tecnical parts collapsible
maltesander Jul 31, 2025
8e781fd
add images, improve docs
maltesander Jul 31, 2025
03292c2
remove run as group for openshift compatibility
maltesander Jul 31, 2025
5d82cba
fix arrow
maltesander Aug 1, 2025
71ac5c7
improve intro message, point to git interaction section
maltesander Aug 1, 2025
8cf5198
trim whitespaces
maltesander Aug 1, 2025
62a7459
Merge branch 'main' into spike/argocd-demo
maltesander Aug 1, 2025
c651590
remove runAsUser for openshift
maltesander Aug 4, 2025
7cba7fa
Apply suggestions from code review
maltesander Aug 5, 2025
ff550b2
extend paragraphs, motivation, conclustion, git interaction
maltesander Aug 6, 2025
e001af8
Merge branch 'main' into spike/argocd-demo
maltesander Aug 6, 2025
e0289c2
Apply suggestions from code review
maltesander Aug 7, 2025
c7b9c35
set dag dir refresh interval to 20 seconds
maltesander Aug 7, 2025
b4a4a91
fix interval override to string
maltesander Aug 7, 2025
6bad754
add timeframe for dag refresh
maltesander Aug 7, 2025
bfb771a
adapt to main branch
maltesander Aug 8, 2025
ab58fd5
fix whitespaces
maltesander Aug 8, 2025
32e26c4
remove whitespace
maltesander Aug 8, 2025
4be215e
exclude svgs from precommit
maltesander Aug 8, 2025
db85464
exclude sealed secrets from pre-commit
maltesander Aug 8, 2025
2bd5540
adapt to bitnami legacy
maltesander Aug 8, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
150 changes: 106 additions & 44 deletions docs/modules/demos/pages/argo-cd-git-ops.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -12,25 +12,46 @@
:airflow-git-sync: https://docs.stackable.tech/home/stable/airflow/usage-guide/mounting-dags/#_via_git_sync
:github-fork: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/fork-a-repo

This demo shows how to utilize GitOps and Infrastructure as Code (IaC) with Stackable and {argo-cd}[ArgoCD].
Basic knowledge about ArgoCD and its {argo-cd-core-concepts}[core concepts] are assumed and not explained further.
== Motivation

Modern Kubernetes environments thrive on automation, reproducibility, and strong version control.
GitOps provides a reliable way to manage infrastructure and applications declaratively: your desired cluster state is stored in Git,
and changes are automatically synchronized. This tutorial demonstrates how to deploy Stackable operators and products using {argo-cd}[ArgoCD],
following Infrastructure-as-Code (IaC) and GitOps principles.

By the end, you will learn how to:
* Deploy Stackable operators and products via Git-managed manifests.
* Use Sealed Secrets to securely manage sensitive credentials.
* Update Airflow DAGs or modify deployments simply by committing to Git.

This hands-on approach illustrates how GitOps can simplify application lifecycle management and enforce a clear,
auditable workflow across environments (development, staging, production).

All products and manifests are synced and deployed via ArgoCD (except ArgoCD itself, which is bootstrapped via `stackablectl`).

The key points to show are:
[#system-requirements]
== System requirements

To run this demo, ensure you have the following prerequisites:
* Kubernetes Cluster (e.g., Kind, Minikube, or managed services like EKS, GKE, AKS)
* `kubectl`: Installed and configured to access your cluster.
* Optional: GitHub Account - Required for forking and interacting with the demo repository.

Resource requirements for this demo:
* 15 {k8s-cpu}[cpu units] (core/hyperthread)
* 15 GiB memory
* 5 GiB disk storage

Basic knowledge about ArgoCD and its {argo-cd-core-concepts}[core concepts] are assumed and not explained further.

* Infrastructure - GitOps
** How to synchronize changes from a Git repository?
** How to apply the Git version and branch control to different Kubernetes clusters (development, staging, production)?
** How to safely store and deploy sensitive data from a Git repository?
* Interaction with Stackable products and Git
** Use Airflow and {airflow-git-sync}[git-sync] to fetch DAGs from a Git repository
** Add or update existing DAGs by commiting to the Git repository
## Architecture

ArgoCD and other deployed products and dependencies are illustrated in the following diagram:

image::argo-cd-git-ops/architecture-overview.drawio.svg[]

## Installation

Install this demo on an existing Kubernetes cluster:

NOTE: In order to interact with the Git repository, this repository must be forked and additional parameters must be provided to `stackablectl`.
Expand All @@ -46,30 +67,6 @@ WARNING: This demo should not be run alongside other demos.
NOTE: ArgoCD will be deployed in the `argo-cd` namespace by `stackablectl`.
ArgoCD itself will create other namespaces for the deployed products.

[#system-requirements]
== System requirements

To run this demo, your system needs at least:

* 15 {k8s-cpu}[cpu units] (core/hyperthread)
* 15 GiB memory
* 5 GiB disk storage

== Overview

This demo consists of multiple parts:

* Bootstrapping {argo-cd}[Argo CD] via `stackablectl`:
* Deploy components via ArgoCD:
** Install a {sealed-secrets}[Sealed Secrets] controller to decrypt secrets stored in a Git repository
** Install all Stackable operators using an ArgoCD `ApplicationSet`
** Install the Airflow dependencies Minio and Postgres as ArgoCD `Application`
** Deploy Stackable Airflow manifests into their respective ArgoCD `Projects`
* Inspect Airflow web UI
** DAGs are synced from a Git repository
** Start a DAG and check the results
** Check Kubernetes Executor logs persisted in Minio

== Sealed Secrets

When managing all resources and configuration via Git, deploying sensitive data like certificates or credentials via Git becomes a problem.
Expand Down Expand Up @@ -123,7 +120,7 @@ using different versions and Git sources (repository & branch) as well as the po

NOTE: This demo does not use a multi cluster environment for the sake of simplicity.

The following part is dives deeper into the definition of theStackable operator `ApplicationSet` and can be skipped.
The following part dives deeper into the definition of the Stackable operator `ApplicationSet` and can be skipped.

.Stackable operators `ApplicationSet` details
[%collapsible]
Expand Down Expand Up @@ -294,44 +291,109 @@ The log files contained in the single folders are the same as the logs shown abo

== Conclusion

This demo illustrates the combination of ArgoCD and Stackable, using the full potential of GitOps and demonstrating key features for a successful IaC deployment using Stackable.
This demo illustrates a repeatable blueprint for managing complex data platform components with ArgoCD and GitOps.
Once familiar with this pattern, you can extend it to multi-cluster environments, add CI/CD pipelines for automated manifest testing,
or integrate external secret stores like HashiCorp Vault for production use. This setup lays the foundation for a fully automated, scalable, and secure Kubernetes-based data platform.

This tutorial demonstrates how ArgoCD and Stackable integrate to deliver a streamlined GitOps experience:
* All cluster resources and workloads are managed declaratively via Git.
* ArgoCD continuously ensures the cluster state matches Git.
* Sealed Secrets provide secure and auditable secret management.
* Airflow DAG updates occur automatically by committing code to the repository.

// TODO: extend...
This approach scales naturally across environments—development, staging, and production—while reducing manual operations, improving visibility,
and enforcing consistency. By adopting GitOps with ArgoCD and Stackable, teams gain a clear, auditable, and automated path from code to production.

The last step is to demonstrate synchronizing changes made to manifests or secrets from the Git repository (to do this the demo is run on a forked GitHub repository).
Next steps:
* Explore multi-cluster ApplicationSet deployments to target multiple Kubernetes clusters.
* Integrate CI workflows to automatically validate and merge manifest updates.
* Expand beyond Airflow to manage additional Stackable components (e.g., Kafka, Trino, Superset).
* Experiment with DataOps (e.g., Airflow and Trino)

[#interact-with-git-repository]
== How to interact with ArgoCD, Airflow and the Git repository

Since this Demo is hosted in the {stackable-demo-repository}[Stackable Demo repository], where merging etc. requires approval, the recommendation is to fork the {stackable-demo-repository}[Stackable Demo repository].

=== Forking the demo repository

This {github-fork}[GitHub tutorial] shows how to fork a repository.

=== Cloning the demo repository

Clone the demo repository:

[source,console]
----
git clone https://github.com/<your-username>/demos.git
cd demos
----

After forking the demo repository, a local copy can be cloned and the `stackablectl` install command must be parameterized with the fork URL and branch.

[source,console]
----
stackablectl demo install argo-cd-git-ops --namespace argo-cd --parameters customGitUrl=<my-demo-fork-url> --parameters customGitBranch=<my-custom-branch-with-changes>
----

In this forked and cloned repository, changes can be made the code and synced into the cluster via ArgoCD.
=== Making changes to the repository

Edit manifests or add new DAG files within your cloned repository:

This way, ArgoCD is instructed to pull the Stackable manifests from the forked repository, where the changes are synced via ArgoCD into the cluster.
* Manifests are in: `demos/argo-cd-git-ops/manifests/`
* Airflow DAGs are in: `demos/argo-cd-git-ops/dags/`

=== Increase Airflow webserver replicas
==== Increase Airflow webserver replicas

Assuming your working directory is the root of the forked demo repository, try to increase the `spec.webservers.roleGroups.<role-group>.replicas` in the folder `demos/argo-cd-git-ops/manifests/airflow/airflow.yaml`.
Once this is pushed / merged, ArgoCD should sync the changes and you should see more webserver pods.

=== Add new Airflow DAGs
==== Add new Airflow DAGs

In the `demos/argo-cd-git-ops/manifests/airflow/airflow.yaml` manifest you have to adapt the git-sync configuration for DAGs to the forked repository:

[source,yaml]
----
dagsGitSync:
- repo: <my-demo-fork-url>
- repo: https://github.com/<your-username>/demos/
branch: <my-custom-branch-with-changes>
[...]
----

Similar to ArgoCD, after adding a new DAG to the folder `demos/argo-cd-git-ops/dags`, Airflow should pick up the new DAG via git-sync and display it in the UI.
After adding a new DAG to the folder `demos/argo-cd-git-ops/dags/`, Airflow should pick up the new DAG via git-sync and display it in the UI.
This may take a while for the syncing process. Refreshing the Airflow UI might help if no DAGs show up.

The synchronisation status of the DAGs can be monitored in via the Airflow scheduler:

[source,console]
----
kubectl logs -n stackable-airflow -c airflow -f svc/airflow-scheduler-default-headless
----

which should show output the DAG processing stats:

[source,console]
----
================================================================================
DAG File Processing Stats

File Path PID Runtime # DAGs # Errors Last Runtime Last Run Last # of DB Queries
-------------------------------------------------------------------- ----- --------- -------- ---------- -------------- ---------- ----------------------
/stackable/app/git-0/current/demos/argo-cd-git-ops/dags/date_demo.py 51 0.03s 0 0 0
================================================================================
[2025-08-06T15:32:23.182+0000] {kubernetes_executor_utils.py:95} INFO - Kubernetes watch timed out waiting for events. Restarting watch.
[2025-08-06T15:32:23.345+0000] {manager.py:997} INFO -
================================================================================
----

==== Commit and push changes

[source,console]
----
git checkout -b <my-custom-branch-with-changes>
git add .
git commit -m "Update Airflow configuration and add new DAG"
git push origin <my-custom-branch-with-changes>
----

Now ArgoCD and Airflow should sync the respective changes into the cluster.
Loading