|
| 1 | +# Amazon EKS cluster monitoring |
| 2 | + |
| 3 | +This example demonstrates how to monitor your Amazon Elastic Kubernetes Service |
| 4 | +(Amazon EKS) cluster with the Observability Accelerator's EKS |
| 5 | +[infrastructure module](https://github.com/aws-observability/terraform-aws-observability-accelerator/tree/main/modules/workloads/infra). |
| 6 | + |
| 7 | +Monitoring Amazon Elastic Kubernetes Service (Amazon EKS) has two categories: |
| 8 | +the control plane and the Amazon EKS nodes (with Kubernetes objects). |
| 9 | +The Amazon EKS control plane consists of control plane nodes that run the Kubernetes software, |
| 10 | +such as etcd and the Kubernetes API server. To read more on the components of an Amazon EKS cluster, |
| 11 | +please read the [service documentation](https://docs.aws.amazon.com/eks/latest/userguide/clusters.html). |
| 12 | + |
| 13 | +The Amazon EKS infrastructure Terraform modules focuses on metrics collection to Amazon |
| 14 | +Managed Service for Prometheus using the [AWS Distro for OpenTelemetry Operator](https://docs.aws.amazon.com/eks/latest/userguide/opentelemetry.html) for Amazon EKS. |
| 15 | +Additionally, it provides default dashboards to get a comprehensible visibility on the nodes, |
| 16 | +namespaces, pods, and kubelet operations health. Finally, you get curated Prometheus recording rules |
| 17 | +and alerts to operate your cluster. |
| 18 | + |
| 19 | +## Prerequisites |
| 20 | + |
| 21 | +Make sure to complete the [prerequisites section](/terraform-aws-observability-accelerator/concepts/#prerequisites) |
| 22 | +before proceeding. |
| 23 | + |
| 24 | +## Setup |
| 25 | + |
| 26 | +### 1. Download sources and initialize Terraform |
| 27 | + |
| 28 | +``` |
| 29 | +git clone https://github.com/aws-observability/terraform-aws-observability-accelerator.git |
| 30 | +cd examples/existing-cluster-with-base-and-infra |
| 31 | +terraform init |
| 32 | +``` |
| 33 | + |
| 34 | +### 2. AWS Region |
| 35 | + |
| 36 | +Specify the AWS Region where the resources will be deployed: |
| 37 | + |
| 38 | +```bash |
| 39 | +export TF_VAR_aws_region=xxx |
| 40 | +``` |
| 41 | + |
| 42 | +### 3. Amazon EKS Cluster |
| 43 | + |
| 44 | +To run this example, you need to provide your EKS cluster name. If you don't |
| 45 | +have a cluster ready, visit [this example](/terraform-aws-observability-accelerator/helpers/new-eks-cluster.md) |
| 46 | +first to create a new one. |
| 47 | + |
| 48 | +Specify your cluster name: |
| 49 | + |
| 50 | +```bash |
| 51 | +export TF_VAR_eks_cluster_id=xxx |
| 52 | +``` |
| 53 | + |
| 54 | +### 4. Amazon Managed Service for Prometheus workspace (optional) |
| 55 | + |
| 56 | +By default, we create an Amazon Managed Service for Prometheus workspace for you. |
| 57 | +However, if you have an existing workspace you want to reuse, edit and run: |
| 58 | + |
| 59 | +```bash |
| 60 | +export TF_VAR_managed_prometheus_workspace_id=ws-xxx |
| 61 | +``` |
| 62 | + |
| 63 | +To create a workspace outside of Terraform's state, simply run: |
| 64 | + |
| 65 | +```bash |
| 66 | +aws amp create-workspace --alias observability-accelerator --query '.workspaceId' --output text |
| 67 | +``` |
| 68 | + |
| 69 | +### 5. Amazon Managed Grafana workspace |
| 70 | + |
| 71 | +To run this example you need an Amazon Managed Grafana workspace. If you have an existing workspace, edit and run: |
| 72 | + |
| 73 | +```bash |
| 74 | +export TF_VAR_managed_grafana_workspace_id=g-xxx |
| 75 | +``` |
| 76 | + |
| 77 | +To create a new one, within this example's Terraform state (sharing the same lifecycle with all the |
| 78 | +other resources created by Terraform): |
| 79 | + |
| 80 | +- Edit main.tf and set `enable_managed_grafana = true` |
| 81 | +- Run |
| 82 | + |
| 83 | +```bash |
| 84 | +terraform init |
| 85 | +terraform apply -target "module.eks_observability_accelerator.module.managed_grafana[0].aws_grafana_workspace.this[0]" |
| 86 | +export TF_VAR_managed_grafana_workspace_id=$(terraform output --raw managed_grafana_workspace_id) |
| 87 | +``` |
| 88 | + |
| 89 | +### 6. Grafana API Key |
| 90 | + |
| 91 | +Amazon Managed Grafana provides a control plane API for generating Grafana API keys. |
| 92 | +As a security best practice, we will provide to Terraform a short lived API key to |
| 93 | +run the `apply` or `destroy` command. |
| 94 | + |
| 95 | +Ensure you have necessary IAM permissions (`CreateWorkspaceApiKey, DeleteWorkspaceApiKey`) |
| 96 | + |
| 97 | +```bash |
| 98 | +export TF_VAR_grafana_api_key=`aws grafana create-workspace-api-key --key-name "observability-accelerator-$(date +%s)" --key-role ADMIN --seconds-to-live 1200 --workspace-id $TF_VAR_managed_grafana_workspace_id --query key --output text` |
| 99 | +``` |
| 100 | + |
| 101 | +## Deploy |
| 102 | + |
| 103 | +Simply run this command to deploy the example |
| 104 | + |
| 105 | +```bash |
| 106 | +terraform apply |
| 107 | +``` |
| 108 | + |
| 109 | +## Visualization |
| 110 | + |
| 111 | +1. Prometheus datasource on Grafana |
| 112 | + |
| 113 | +Open your Grafana workspace and under Configuration -> Data sources, you should see `aws-observability-accelerator`. Open and click `Save & test`. You should see a notification confirming that the Amazon Managed Service for Prometheus workspace is ready to be used on Grafana. |
| 114 | + |
| 115 | +2. Grafana dashboards |
| 116 | + |
| 117 | +Go to the Dashboards panel of your Grafana workspace. You should see a list of dashboards under the `Observability Accelerator Dashboards` |
| 118 | + |
| 119 | +<img width="1540" alt="image" src="https://user-images.githubusercontent.com/10175027/190000716-29e16698-7c90-49d6-8c37-79ca1790e2cc.png"> |
| 120 | + |
| 121 | +Open a specific dashboard and you should be able to view its visualization |
| 122 | + |
| 123 | +<img width="2056" alt="cluster headlines" src="https://user-images.githubusercontent.com/10175027/199110753-9bc7a9b7-1b45-4598-89d3-32980154080e.png"> |
| 124 | + |
| 125 | +2. Amazon Managed Service for Prometheus rules and alerts |
| 126 | + |
| 127 | +Open the Amazon Managed Service for Prometheus console and view the details of your workspace. Under the `Rules management` tab, you should find new rules deployed. |
| 128 | + |
| 129 | +<img width="1629" alt="image" src="https://user-images.githubusercontent.com/10175027/189301297-4865e75d-2d71-434f-b5d0-9750b3533632.png"> |
| 130 | + |
| 131 | + |
| 132 | +To setup your alert receiver, with Amazon SNS, follow [this documentation](https://docs.aws.amazon.com/prometheus/latest/userguide/AMP-alertmanager-receiver.html) |
| 133 | + |
| 134 | + |
| 135 | +## Destroy resources |
| 136 | + |
| 137 | +If you leave this stack running, you will continue to incur charges. To remove all resources |
| 138 | +created by Terraform, [refresh your Grafana API key](#6-grafana-api-key) and run the command below. |
| 139 | + |
| 140 | +Be careful, this command will removing everything created by Terraform. If you wish |
| 141 | +to keep your Amazon Managed Grafana or Amazon Managed Service for Prometheus workspaces. Remove them |
| 142 | +from your terraform state before running the destroy command. |
| 143 | + |
| 144 | +```bash |
| 145 | +terraform destroy |
| 146 | +``` |
| 147 | + |
| 148 | +To remove resources from your Terraform state, run |
| 149 | + |
| 150 | +```bash |
| 151 | +# grafana workspace |
| 152 | +terraform state rm "module.eks_observability_accelerator.module.managed_grafana[0].aws_grafana_workspace.this[0]" |
| 153 | + |
| 154 | +# prometheus workspace |
| 155 | +terraform state rm "module.eks_observability_accelerator.aws_prometheus_workspace.this[0]" |
| 156 | +``` |
| 157 | + |
| 158 | + |
| 159 | +> **Note:** To view all the features proposed by this module, visit the [module documentation](https://github.com/aws-observability/terraform-aws-observability-accelerator/tree/main/modules/workloads/infra). |
0 commit comments