- Select the correct AWS credentials to use with sufficient privileges to spin up the infrastructure, e.g.
export AWS_PROFILE=admin - Create a backend to store terraform state if requires. There is an example provided under
examples/backend_intthat creates the s3 bucket to store terraform state and the dynamodb table to store terraform state lock. - To create a full Open Data Cube (ODC) eks infrastructure on AWS platform, execute each module defined under
examples/stage/<module>. Please note that the number in front, e.g.01_, represents the correct order of execution to manage dependencies. - You need to adjust some of the configuration params such as
owner,namespace,environment,regionandterraform backend- so they are unique to your organisation.
Once you have created a terraform backend and updated the configuration parameters, you can perform the following steps to setup a new ODC cluster environment -
- Change directory to
examples/stage/01_odc_eks/ - Modify
data_providers.tf:- Change
region,owner,namespace, andenvironmentto the values frombackend_init. - If you do not want a database change
db_enabledtofalse
- Change
- Modify
main.tf:bucket:<namespace>-<environment>-backend-tfstateregion:<region>
- Run
terraform initto initialize Terraform state tracking - Run
terraform planto do a dry run and validate examples and interaction of modules - Run
terraform applyto spin up infrastructure (a new ODC EKS Cluster), -- can take upto 15-20 minutes - Validate a fresh kubernetes cluster has been created by adding a new kubernetes context and getting clusterinfo
aws eks update-kubeconfig --name <cluster-id>
kubectl cluster-info- Change directory to
examples/stage/02_odc_k8s/ - Modify
data_providers.tfwith the following:- Modify the
bucketandregionsuch that:bucket:<namespace>-<environment>-backend-tfstateregion:<region>
- Modify the
- Modify
main.tf:bucket:<namespace>-<environment>-backend-tfstateregion:<region>
- Run
terraform init,terraform plan,terraform applyas above and deploy flux, helm etc. to the live k8s cluster - If you enabled the database it is now when you need to create the AWS SSM variables for the database access. The name of the new variable should be
/<namespce>-<environment>-eks/ows_ro/db.credsAnd should have the content of<db_username>:<db_password> - Get pods from the kubernetes admin namespace to verify services such as flux and helm were deployed
kubectl get pods —all-namespaces- FluxCD is configured (in
02_odc_k8smodule) to monitor flux-odc-sample repo that defines Helm Releases for the ODC cluster. Create your own live repo and update flux configuration. - Execute example modules
examples/stage/03_odc_apps_k8sandexamples/stage/04_odc_k8s_sandbox. - For each of the stages
03and04you need to modify the following files:- Modify
data_providers.tfwith the following:- Modify the
bucketandregionsuch that:bucket:<namespace>-<environment>-backend-tfstateregion:<region>
- Modify the
- Modify
main.tf:bucket:<namespace>-<environment>-backend-tfstateregion:<region>This will setup a full sandbox/jupyterhub environment, ows web service and also installs necessary admin & monitoring components, roles and kubernetes secrets, etc to manage your cluster.
- Modify
- Fetch a flux deployment key and copy it to repo ssh public key or deploy keys section:
kubectl -n flux logs deployment/flux | grep identity.pub |cut -d '"' -f2- Make sure all the releases are deployed. Verify using -
kubectl get hr —all-namespaces- To access Grafana (prometheus), get the password using the below command. It is a base64 encoded password.
kubectl get secret prometheus-operator-grafana -n monitoring -o yaml | grep "admin-password:" | sed 's/admin-password: //' | base64 -d -i`- Remove
fluxdeploy ssh key. This will stop flux from being able to read/write from your live repo. - Delete helm releases (HRs) in the following order. Assuming that you have a similar setup as defined in example -
- First delete all the apps under
sandbox,processingandwebkubernetes namespace
kubectl delete hr --all=true -n sandbox
kubectl delete hr --all=true -n processing
kubectl delete hr --all=true -n web- Delete prometheus-operator HR and CRDs and everything under monitoring namespaces
kubectl delete hr prometheus-operator -n monitoring
kubectl delete crd prometheuses.monitoring.coreos.com
kubectl delete crd prometheusrules.monitoring.coreos.com
kubectl delete crd servicemonitors.monitoring.coreos.com
kubectl delete crd podmonitors.monitoring.coreos.com
kubectl delete crd alertmanagers.monitoring.coreos.com
kubectl delete hr --all=true -n monitoring- Delete all the apps under
adminnamespace. Also make sure external DNS has cleared all the DNS entry in Route53
kubectl delete hr --all=true -n admin- Destroy all the terraform infrastructure in reverse order. Run
terraform init --upgradeandterraform destroycommand under each namespace
- Destroy jupyterhub tf infra -
04_odc-k8s-sandbox - Destroy apps tf infra -
03_odc-k8s-apps - Destroy k8s tf infra -
02_odc-k8s - Destroy eks tf infra -
01_odc-eks
- Issue deleting
01_odc_eksmodule:
* VPC module destroy may have issue cleanup itself properly -
Error: Error deleting VPC: DependencyViolation: The vpc 'vpc-09a9c52f9a2a53a4e' has dependencies and cannot be deleted.
status code: 400, request id: 4548f188-8f1e-4cb5-a8e8-c92ba56e9401
Resolve: Destroy VPC manually