Skip to content

Conversation

@allamand
Copy link
Contributor

  • Add EKS cluster Terraform configuration updates
  • Add Identity Center Terraform module for IAM integration
  • Update deployment scripts with force-unlock helper
  • Add hub-config.yaml changes for EKS capabilities
  • Update ArgoCD and secrets Terraform configurations
  • Add EKS Capabilities ArgoCD setup documentation

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@allamand allamand self-assigned this Jan 23, 2026
@allamand allamand marked this pull request as draft January 23, 2026 17:34
@allamand allamand force-pushed the feat/eks-capabilities-integration branch 2 times, most recently from 62f5f5b to b19a134 Compare January 29, 2026 17:01
allamand and others added 17 commits February 2, 2026 22:51
…yment scripts

- Add EKS cluster Terraform configuration updates
- Add Identity Center Terraform module for IAM integration
- Update deployment scripts with force-unlock helper
- Add hub-config.yaml changes for EKS capabilities
- Update ArgoCD and secrets Terraform configurations
- Add EKS Capabilities ArgoCD setup documentation
…yment scripts

- Add EKS cluster Terraform configuration updates
- Add Identity Center Terraform module for IAM integration
- Update deployment scripts with force-unlock helper and state lock management
- Add hub-config.yaml changes for EKS capabilities
- Update ArgoCD and secrets Terraform configurations
- Add EKS Capabilities ArgoCD setup documentation
- Add multi-acct EKS capabilities RBAC and IAM role selectors
- Keep spark_operator enabled
- Disable ArgoCD client creation (using EKS Marina managed ArgoCD)
- Remove ARGOCD_SESSION_TOKEN from keycloak-clients secret
- Add AWS Secrets Manager ClusterSecretStore for platform secrets
- Add Keycloak split-brain detector CronJob for cluster health monitoring
- Change destination server from hardcoded kubernetes.default.svc to {{server}} template
- Enables proper multi-cluster fleet management
- Switch from ARGOCD_SESSION_TOKEN (keycloak-clients) to ARGOCD_AUTH_TOKEN (cluster secrets)
- Aligns with EKS Marina managed ArgoCD authentication
- Remove ARGOCD_AUTH_TOKEN from external secret configuration
- Backstage will use OIDC authentication instead of token-based auth
- Remove stuck operation before clearing operation state
- Simplify revision conflict fix by clearing operation first
- Skip apps that are already Healthy and Synced
- Only recover apps with truly stuck operations (Running >5min)
- Prevent unnecessary recovery attempts on healthy apps
- Revert to checking both stuck operations and Progressing status
- Restore simpler condition for stuck app detection
- Add detection of stale operations (finishedAt exists but phase=Running)
- Clear stale operation state without unnecessary retries
- Fix revision conflict handling to sync to HEAD
- Improve output to show finished timestamp for better debugging
…fests

- Add HuggingFace model download documentation
- Add Kro resource group for HuggingFace model management
- Add platform manifest template for HuggingFace models
- Update addons configuration
- Update GitLab initialization script
- Remove deprecated platform-manifests values.yaml
- Update addons configuration for platform manifests
- Enhance Argo Workflows installation template
- Improve CICD pipeline Kro resource group
- Refine HuggingFace model resource group
- Update cluster secret store configuration
- Configure Spark operator values
…epools

- Comment out compute_config in Terraform to disable Auto Mode default nodepools
- Enable customNodepools in platform-manifests values
- Custom nodepools provide more control over instance types, taints, and disruption policies
@allamand allamand force-pushed the feat/eks-capabilities-integration branch from e0ee763 to 5c4efb0 Compare February 2, 2026 21:52
allamand and others added 8 commits February 2, 2026 23:03
- Add recover_stuck_workflows() function to detect and delete workflows stuck > 15min
- Integrate into wait_for_sync_wave_completion() to run every 30s
- Prevents cascading failures when workflows hang (e.g., mysql-setup-workflow)
- Allows ArgoCD to automatically recreate workflows after deletion
- Add second SecurityGroupIngressRule referencing EKS cluster SG
- Extract cluster_security_group_id from VPC secret
- Add clusterSecurityGroupId to Crossplane EnvironmentConfig
- Ensures reliable pod-to-RDS connectivity without manual SG fixes
Signed-off-by: Workshop User <[email protected]>
…yment scripts

- Add EKS cluster Terraform configuration updates
- Add Identity Center Terraform module for IAM integration
- Update deployment scripts with force-unlock helper
- Add hub-config.yaml changes for EKS capabilities
- Update ArgoCD and secrets Terraform configurations
- Add EKS Capabilities ArgoCD setup documentation
…yment scripts

- Add EKS cluster Terraform configuration updates
- Add Identity Center Terraform module for IAM integration
- Update deployment scripts with force-unlock helper and state lock management
- Add hub-config.yaml changes for EKS capabilities
- Update ArgoCD and secrets Terraform configurations
- Add EKS Capabilities ArgoCD setup documentation
- Add multi-acct EKS capabilities RBAC and IAM role selectors
- Keep spark_operator enabled
- Disable ArgoCD client creation (using EKS Marina managed ArgoCD)
- Remove ARGOCD_SESSION_TOKEN from keycloak-clients secret
- Add AWS Secrets Manager ClusterSecretStore for platform secrets
- Add Keycloak split-brain detector CronJob for cluster health monitoring
- Change destination server from hardcoded kubernetes.default.svc to {{server}} template
- Enables proper multi-cluster fleet management
@allamand allamand marked this pull request as ready for review February 6, 2026 15:28
allamand and others added 14 commits February 6, 2026 17:14
Signed-off-by: Sébastien Allamand <[email protected]>
Signed-off-by: Workshop User <[email protected]>
Signed-off-by: Sébastien Allamand <[email protected]>
Signed-off-by: Workshop User <[email protected]>
Signed-off-by: Workshop User <[email protected]>
Signed-off-by: Workshop User <[email protected]>
Signed-off-by: Workshop User <[email protected]>
Signed-off-by: Workshop User <[email protected]>
Signed-off-by: Workshop User <[email protected]>
Signed-off-by: Workshop User <[email protected]>
ARGOCD_URL="$ARGOCD_SERVER_URL"
print_info "Using EKS-managed ArgoCD URL: $ARGOCD_URL"
else
ARGOCD_URL="https://$DOMAIN_NAME/argocd"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@allamand Are we still setting up argocd during initial bootstrap? Do we need to set this condition with capabilities enabled?

Workshop User and others added 7 commits February 11, 2026 13:49
Signed-off-by: Workshop User <[email protected]>
Signed-off-by: Workshop User <[email protected]>
…if 1 nginx pod or if 2 are in same AZ

Signed-off-by: Workshop User <[email protected]>
Signed-off-by: Workshop User <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants