This repository demonstrates a comprehensive GitOps implementation for managing applications on AWS EKS clusters using ArgoCD and Helm Charts. The architecture follows GitOps principles to maintain a declarative infrastructure and application deployment strategy, with built-in observability through Splunk OpenTelemetry Collector integration.
Key features include:
- Multi-environment support (Development, Production)
- Full application lifecycle management via GitOps
- Comprehensive observability with Splunk OpenTelemetry Collector
- AWS ALB-based ingress management with proper health checks
- Pre-deployment validation with sync health checks
- Automatic application instrumentation for observability
flowchart TD
%% GitOps Process Flow Diagram
%% Stage 1: Developer pushes changes
Dev("π¨βπ» Developer") --> |"1οΈβ£ Commits<br>& Pushes"| GitRepo[("π¦ Git<br>Repository")]
%% Stage 2: ArgoCD detects changes
subgraph ArgoCD ["π ArgoCD"]
direction TB
Monitor["π 1. Monitor"] --> Diff["π 2. Compare<br>Differences"]
Diff --> Validate["β
3. Validate<br>Health Checks"]
Validate --> Apply["βοΈ 4. Apply<br>Changes"]
end
GitRepo --> |"2οΈβ£ Detect<br>Changes"| ArgoCD
%% Stage 3: Changes applied to cluster
ArgoCD --> |"3οΈβ£ Apply<br>Changes"| EKS["βΈοΈ EKS Cluster"]
%% Stage 4: Application deployment in environment
subgraph EKS
direction TB
Infra["π§± Infrastructure<br>(Collectors, Services)"]
Apps["π± Applications<br>(Frontends, APIs)"]
end
%% Stage 5: Telemetry collection
subgraph Observability ["π Observability"]
OTEL["π OpenTelemetry<br>Collector"] --> |"Send<br>Telemetry"| Splunk["π Splunk<br>Platform"]
end
EKS --> |"4οΈβ£ Generate<br>Logs/Metrics"| OTEL
%% Display different environments
subgraph Environments ["π Environments"]
direction LR
Dev_Env["π§ͺ Development"]
Prod_Env["π Production"]
end
ArgoCD --> |"Sync to"| Environments
Environments --> |"Deployed on"| EKS
%% Node styling
classDef step fill:#f9f9f9,stroke:#333,stroke-width:1px,color:black
classDef git fill:#f34f29,color:white,stroke:#da5a47,stroke-width:2px
classDef argocd fill:#329AD6,color:white,stroke:#2f90c5,stroke-width:2px
classDef k8s fill:#326CE5,color:white,stroke:#2e64d4,stroke-width:2px
classDef obs fill:#111111,color:white,stroke:#000000,stroke-width:2px
classDef env fill:#FF9900,color:white,stroke:#ed8f00,stroke-width:2px
class Dev,GitRepo step
class GitRepo git
class ArgoCD,Monitor,Diff,Validate,Apply argocd
class EKS,Infra,Apps k8s
class Observability,OTEL,Splunk obs
class Environments,Dev_Env,Prod_Env env
GitOps is a paradigm shift in how we manage and deploy infrastructure and applications. At its core, GitOps uses Git as the single source of truth for declarative infrastructure and applications. With GitOps:
- β Declarative - Everything is defined as code (Infrastructure as Code)
- β Versioned & Immutable - Complete history of all changes
- β Pulled Automatically - Changes are pulled by operators, not pushed
- β Continuously Reconciled - System ensures actual state matches desired state
Traditional deployment approaches often suffer from environment drift, manual errors, and lack of audit trails. GitOps solves these problems by:
-
π Improving Security
- Reduced direct access to production systems
- Cryptographically verifiable audit trail of all changes
- Role-based access controls through Git
-
β‘ Accelerating Deployments
- Automated continuous delivery pipelines
- Faster recovery from failures
- Easier rollbacks to previous stable states
-
π Enhancing Visibility
- Complete audit history of all changes
- Clear visibility into what's deployed where
- Improved collaboration between teams
-
π Ensuring Consistency
- Eliminates drift between environments
- Consistent state across all clusters
- Self-healing infrastructure
ArgoCD serves as the GitOps engine in our architecture, continuously synchronizing the desired state in Git with the actual state in Kubernetes. As a Kubernetes-native tool, ArgoCD:
- π Continuously monitors Git repositories for changes
- π Compares the current state with the desired state
- π§ Automatically applies necessary changes to reach desired state
- π¨ Detects and alerts on drift or synchronization failures
- π± Provides a UI dashboard for visibility across all applications
Our implementation combines GitOps with Helm charts and multi-environment support to achieve:
-
π Environment Parity - Dev and production environments use the same deployment process with environment-specific configurations.
-
π¦ Application Packaging - Helm charts standardize application deployment patterns.
-
π Continuous Synchronization - ArgoCD ensures the cluster state always matches the Git repository.
-
π Observability Integration - Splunk OpenTelemetry Collector provides comprehensive monitoring.
-
π Seamless Rollbacks - In case of issues, reverting to a previous state is as simple as reverting a Git commit.
Organizations implementing GitOps have reported:
- β±οΈ 80% reduction in time to deploy
- π 90% improvement in recovery time
- π§ 70% reduction in configuration errors
- π Significant increase in deployment frequency
By making Git the single source of truth, this architecture provides a robust, audit-able, and scalable approach to managing Kubernetes infrastructure across multiple environments.
The repository follows a well-organized structure to manage multiple applications across development and production environments:
.
βββ .gitignore
βββ README.md
β
βββ HelmCharts/
β βββ annotations
β β
β βββ splunk-otel-collector/
β β βββ .helmignore
β β βββ Chart.yaml
β β βββ values.yaml
β β βββ templates/
β β βββ _helpers.tpl
β β βββ configmap.yml
β β βββ deployment.yaml
β β βββ hpa.yaml
β β βββ NOTES.txt
β β βββ pre-sync-healthcheck.yaml
β β βββ secret.yaml
β β βββ service.yaml
β β βββ serviceaccount.yaml
β β
β βββ app-client/
β β βββ .helmignore
β β βββ Chart.yaml
β β βββ values.yaml
β β βββ values-dev.yaml
β β βββ values-prod.yaml
β β βββ templates/
β β βββ _helpers.tpl
β β βββ configmap.yml
β β βββ deployment.yaml
β β βββ hpa.yaml
β β βββ NOTES.txt
β β βββ pre-sync-healthcheck.yaml
β β βββ service.yaml
β β βββ serviceaccount.yaml
β β
β βββ app-api/
β βββ .helmignore
β βββ Chart.yaml
β βββ values.yaml
β βββ values-dev.yaml
β βββ values-prod.yaml
β βββ templates/
β βββ _helpers.tpl
β βββ configmap.yml
β βββ deployment.yaml
β βββ hpa.yaml
β βββ ingress.yaml
β βββ NOTES.txt
β βββ pre-sync-healthcheck.yaml
β βββ service.yaml
β βββ serviceaccount.yaml
β
βββ eks-dev/
β βββ applications/
β β βββ splunk-otel-collector.yaml
β β βββ app-client.yaml
β β βββ app-api.yaml
β βββ root.yaml
β
βββ eks-prod/
βββ applications/
β βββ splunk-otel-collector.yaml
β βββ app-client.yaml
β βββ app-api.yaml
βββ root.yaml
The HelmCharts directory contains all Helm charts for both applications and infrastructure. Each chart includes:
These are your business applications deployed to the cluster:
-
Frontend (app-client)
- Web application (React, Angular, etc.)
- Configured with NodeJS instrumentation for observability
- Exposed via AWS ALB ingress
- TCP health checks for availability validation
-
Backend API (app-api)
- API service (.NET Core, Node.js, Java, etc.)
- Instrumented with OpenTelemetry for tracing and metrics
- Configured with HTTP health checks
- Exposed via AWS ALB ingress with path-based routing
This is responsible for collecting and forwarding telemetry data:
- Splunk OpenTelemetry Collector
- Collects logs from Kubernetes pods
- Gathers metrics from Kubelet
- Receives traces from instrumented applications
- Forwards all telemetry to Splunk platform
- Configured for auto-discovery of applications
The architecture supports multiple environments through separate directories:
- Contains ArgoCD applications targeting the development cluster
- Uses
values-dev.yamlfor environment-specific configurations - Configured for internal access with appropriate security groups
- Contains ArgoCD applications targeting the production cluster
- Uses
values-prod.yamlfor environment-specific configurations - Configured with stricter security and higher resource requirements
A notable feature is the pre-sync health checks implemented for each application:
- API health checks: HTTP-based validation for backend services
- UI health checks: TCP-based connection tests for frontend applications
- Collector health checks: Port availability validation for the collector
These checks run before ArgoCD applies changes, ensuring that only healthy applications are updated and preventing broken deployments.
The architecture incorporates several security best practices:
- Secret Management: Splunk tokens stored in Kubernetes secrets
- RBAC: Service accounts with least-privilege permissions
- Network Security: AWS security groups control access
- TLS: HTTPS termination at ALB with modern TLS policies
- Health Checks: Prevent deployment of broken applications
- AWS CLI configured with appropriate permissions
- kubectl installed and configured
- Helm 3.x installed
- ArgoCD CLI installed
- Splunk HEC token and endpoint information
-
Clone the Repository
git clone https://github.com/your-username/your-gitops-repo.git cd your-gitops-repo -
Configure AWS CLI and Connect to EKS
aws configure aws eks update-kubeconfig --name your-cluster-name --region your-region
-
Install ArgoCD on Your Cluster
kubectl create namespace argocd kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
-
Configure Splunk Token
Update the
values.yamlin theHelmCharts/splunk-otel-collectordirectory:# Splunk specific settings splunk: endpoint: "https://your-splunk-hec-endpoint.com/services/collector" token: "your-splunk-hec-token" index: "k8-dev"
-
Apply the Root Application
This will bootstrap the entire GitOps process:
kubectl apply -f eks-dev/root.yaml
-
Access ArgoCD UI
kubectl port-forward svc/argocd-server -n argocd 8080:443
Then open https://localhost:8080 in your browser.
-
Create a New Helm Chart
Add your chart in the
HelmChartsdirectory:mkdir -p HelmCharts/your-new-app/templates
Create the necessary files following the structure of existing charts:
- Chart.yaml
- values.yaml
- values-dev.yaml
- values-prod.yaml
- Templates directory with resources
-
Add ArgoCD Application Definition
Create a new file in
eks-dev/applications/your-new-app.yaml:apiVersion: argoproj.io/v1alpha1 kind: Application metadata: name: your-new-app namespace: argocd spec: destination: name: in-cluster namespace: eks-dev source: path: "HelmCharts/your-new-app" repoURL: "git@github.com:your-username/your-gitops-repo.git" targetRevision: HEAD # Add other necessary configurations
-
Add Observability Annotations
For automatic instrumentation, add the appropriate annotations to your application:
For .NET applications:
annotations: instrumentation.opentelemetry.io/inject-dotnet: "splunk-otel-collector/splunk-otel-collector" instrumentation.opentelemetry.io/otel-dotnet-auto-runtime: "linux-x64"
For NodeJS applications:
annotations: instrumentation.opentelemetry.io/inject-nodejs: "splunk-otel-collector/splunk-otel-collector"
For Java applications:
annotations: instrumentation.opentelemetry.io/inject-java: "splunk-otel-collector/splunk-otel-collector"
For Python applications:
annotations: instrumentation.opentelemetry.io/inject-python: "splunk-otel-collector/splunk-otel-collector"
The Splunk OpenTelemetry Collector is a key component in this GitOps architecture, providing comprehensive observability for all applications.
- Automatic Log Collection: Captures container logs from all pods
- Kubernetes Metrics: Collects metrics from Kubelet API
- Application Instrumentation: Auto-instruments applications using agents
- Span Collection: Captures distributed traces for request flows
- Direct Splunk Integration: Forwards all telemetry to Splunk Cloud
The collector configuration is divided into several parts:
-
Receivers: Define data input sources
filelog: Container log fileskubeletstats: Kubernetes metricsotlp: OpenTelemetry Protocol for traces and metrics
-
Processors: Transform and enrich data
batch: Group data for efficient transmissionk8sattributes: Add Kubernetes metadataresourcedetection: Detect cloud provider information
-
Exporters: Define data output destinations
splunk_hec: Send metrics to Splunksplunk_hec/logs: Send logs to Splunk
After deployment, verify the collector is working correctly:
# Check collector pods
kubectl get pods -n splunk-otel-collector
# Check collector logs
kubectl logs -n splunk-otel-collector -l app.kubernetes.io/name=splunk-otel-collector -f
# Verify applications are instrumented
kubectl get pods -n eks-dev -o jsonpath='{.items[*].metadata.annotations}' | grep instrumentationThe GitOps workflow in this architecture follows these steps:
-
Development
- Developer makes code changes to application
- CI pipeline builds and pushes container image
- Developer updates image tag in values file
- Changes are committed to Git repository
-
Detection
- ArgoCD detects changes in Git repository
- Changes are analyzed against current state
-
Validation
- Pre-sync health checks run before applying changes
- Current deployments are validated for health
-
Deployment
- ArgoCD applies changes to the cluster
- Resources are created or updated following sync waves
-
Verification
- Post-deployment health checks confirm successful rollout
- Splunk begins collecting telemetry from updated applications
Applications are deployed in specific order using sync waves:
- Infrastructure (Wave 0): splunk-otel-collector is deployed first
- Backend Services (Wave 1): API services are deployed next
- Frontend Applications (Wave 2): UI applications are deployed last
This ensures dependencies are available before dependent applications are deployed.
Common issues and solutions:
-
Problem: Application shows "OutOfSync" but doesn't sync
- Solution: Check Application's sync status in ArgoCD UI
- Command:
argocd app get <app-name> --refresh
-
Problem: Sync fails with error
- Solution: Check the error in ArgoCD UI and logs
- Command:
kubectl logs -n argocd deployment/argocd-application-controller
-
Problem: Helm template rendering errors
- Solution: Validate templates locally
- Command:
helm template HelmCharts/your-app --debug
-
Problem: Invalid chart structure
- Solution: Verify chart structure follows Helm best practices
- Reference: Helm Best Practices
-
Problem: No data in Splunk
- Check: Verify HEC token and endpoint configuration
- Command:
kubectl get secret -n splunk-otel-collector splunk-otel-collector-secrets -o yaml
-
Problem: Collector pods crashing
- Check: Inspect collector logs
- Command:
kubectl logs -n splunk-otel-collector -l app.kubernetes.io/name=splunk-otel-collector
This architecture can be extended to manage multiple clusters:
- Create additional environment directories (e.g.,
eks-staging) - Configure ArgoCD with multiple clusters
argocd cluster add <cluster-name>
- Adjust application manifests to target specific clusters
spec: destination: name: cluster-name namespace: target-namespace
Integrate the GitOps flow with CI/CD pipelines:
-
CI Pipeline - Build and test application code
- Triggered by code commits
- Builds container images
- Runs tests and security scans
- Pushes images to registry
-
CD Integration - Update image versions in Git
- Updates image tags in values files
- Commits changes to trigger ArgoCD sync
- Optionally use tools like Kustomize for dynamic replacements
Implement disaster recovery for the entire architecture:
-
Backup ArgoCD State
argocd admin export > argocd-backup.yaml
-
Infrastructure Recovery Procedure
- Store procedure in documentation
- Include steps to recover EKS cluster
- Include steps to reinstall ArgoCD
- Include steps to apply root application
-
Multi-region Redundancy
- Configure additional clusters in different regions
- Use global DNS for failover
- Replicate data between regions
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Follow Helm best practices for Helm charts
- Use Conventional Commits for commit messages
- Include meaningful comments in complex templates
- Update documentation when adding features
This project is licensed under the MIT License - see the LICENSE file for details.