|
1 |
| -# Amazon EKS cluster monitoring |
| 1 | +# Amazon EKS cluster metrics |
2 | 2 |
|
3 | 3 | This example demonstrates how to monitor your Amazon Elastic Kubernetes Service
|
4 | 4 | (Amazon EKS) cluster with the Observability Accelerator's EKS
|
5 | 5 | [infrastructure module](https://github.com/aws-observability/terraform-aws-observability-accelerator/tree/main/modules/workloads/infra).
|
6 | 6 |
|
7 |
| -Monitoring Amazon Elastic Kubernetes Service (Amazon EKS) has two categories: |
| 7 | +Monitoring Amazon Elastic Kubernetes Service (Amazon EKS) for metrics has two categories: |
8 | 8 | the control plane and the Amazon EKS nodes (with Kubernetes objects).
|
9 | 9 | The Amazon EKS control plane consists of control plane nodes that run the Kubernetes software,
|
10 | 10 | such as etcd and the Kubernetes API server. To read more on the components of an Amazon EKS cluster,
|
11 | 11 | please read the [service documentation](https://docs.aws.amazon.com/eks/latest/userguide/clusters.html).
|
12 | 12 |
|
13 | 13 | The Amazon EKS infrastructure Terraform modules focuses on metrics collection to Amazon
|
14 |
| -Managed Service for Prometheus using the [AWS Distro for OpenTelemetry Operator](https://docs.aws.amazon.com/eks/latest/userguide/opentelemetry.html) for Amazon EKS. |
15 |
| -Additionally, it provides default dashboards to get a comprehensible visibility on the nodes, |
| 14 | +Managed Service for Prometheus using the [AWS Distro for OpenTelemetry Operator](https://docs.aws.amazon.com/eks/latest/userguide/opentelemetry.html) for Amazon EKS. It deploys the [node exporter](https://github.com/prometheus/node_exporter) and [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics) in your cluster. |
| 15 | + |
| 16 | +It provides default dashboards to get a comprehensible visibility on your nodes, |
16 | 17 | namespaces, pods, and kubelet operations health. Finally, you get curated Prometheus recording rules
|
17 | 18 | and alerts to operate your cluster.
|
18 | 19 |
|
| 20 | +Additionally, you can optionally collect additional custom Prometheus metrics from your applications running |
| 21 | +on your EKS cluster. |
| 22 | + |
19 | 23 | ## Prerequisites
|
20 | 24 |
|
21 | 25 | Make sure to complete the [prerequisites section](https://aws-observability.github.io/terraform-aws-observability-accelerator/concepts/#prerequisites)
|
@@ -132,28 +136,35 @@ Open the Amazon Managed Service for Prometheus console and view the details of y
|
132 | 136 | To setup your alert receiver, with Amazon SNS, follow [this documentation](https://docs.aws.amazon.com/prometheus/latest/userguide/AMP-alertmanager-receiver.html)
|
133 | 137 |
|
134 | 138 |
|
135 |
| -## Destroy resources |
| 139 | +## Custom metrics collection |
136 | 140 |
|
137 |
| -If you leave this stack running, you will continue to incur charges. To remove all resources |
138 |
| -created by Terraform, [refresh your Grafana API key](#6-grafana-api-key) and run the command below. |
| 141 | +In addition to the cluster metrics, if you are interested in collecting Prometheus |
| 142 | +metrics from your pods, you can use setup `custom metrics collection`. |
| 143 | +This will instruct the ADOT collector to scrape your applications metrics based |
| 144 | +on the configuration you provide. You can also exclude some of the metrics and save costs. |
139 | 145 |
|
140 |
| -Be careful, this command will removing everything created by Terraform. If you wish |
141 |
| -to keep your Amazon Managed Grafana or Amazon Managed Service for Prometheus workspaces. Remove them |
142 |
| -from your terraform state before running the destroy command. |
143 |
| - |
144 |
| -```bash |
145 |
| -terraform destroy |
146 |
| -``` |
| 146 | +Using the example, you can edit `examples/existing-cluster-with-base-and-infra/main.tf`. |
| 147 | +In the module `module "workloads_infra" {` add the following config (make sure the values matches your use case): |
147 | 148 |
|
148 |
| -To remove resources from your Terraform state, run |
| 149 | +```hcl |
| 150 | +enable_custom_metrics = true |
149 | 151 |
|
150 |
| -```bash |
151 |
| -# grafana workspace |
152 |
| -terraform state rm "module.eks_observability_accelerator.module.managed_grafana[0].aws_grafana_workspace.this[0]" |
| 152 | +custom_metrics_config = { |
| 153 | + # list of applications ports (example) |
| 154 | + ports = [8000, 8080] |
153 | 155 |
|
154 |
| -# prometheus workspace |
155 |
| -terraform state rm "module.eks_observability_accelerator.aws_prometheus_workspace.this[0]" |
| 156 | + # list of series prefixes you want to discard from ingestion |
| 157 | + dropped_series_prefix = ["go_gcc"] |
| 158 | +} |
156 | 159 | ```
|
157 | 160 |
|
| 161 | +After applying Terraform, on Grafana, you can query Prometheus for your application metrics, |
| 162 | +create alerts and build on your own dashboards. On the explorer section of Grafana, the |
| 163 | +following query will give you the containers exposing metrics that matched the custom metrics |
| 164 | +collection, grouped by cluster and node. |
| 165 | + |
| 166 | +```promql |
| 167 | +sum(up{job="custom-metrics"}) by (container_name, cluster, nodename) |
| 168 | +``` |
158 | 169 |
|
159 |
| -> **Note:** To view all the features proposed by this module, visit the [module documentation](https://github.com/aws-observability/terraform-aws-observability-accelerator/tree/main/modules/workloads/infra). |
| 170 | +<img width="2560" alt="Screenshot 2023-01-31 at 11 16 21" src="https://user-images.githubusercontent.com/10175027/215869004-e05f557d-c81a-41fb-a452-ede9f986cb27.png"> |
0 commit comments