docs: custom metrics collection on EKS (#107)

bonclay7 · web-flow · commit c135e90854b6 · 2023-01-31T21:10:04.000+01:00
* Add custom metrics collection docs

* fixup! Add custom metrics collection docs

* Add screenshot
diff --git a/docs/eks/destroy.md b/docs/eks/destroy.md
@@ -0,0 +1,24 @@
+# Destroy resources
+
+If you leave this stack running, you will continue to incur charges. To remove all resources
+created by Terraform, [refresh your Grafana API key](https://aws-observability.github.io/terraform-aws-observability-accelerator/eks/#6-grafana-api-key) and run the command below.
+
+Be careful, this command will removing everything created by Terraform. If you wish
+to keep your Amazon Managed Grafana or Amazon Managed Service for Prometheus workspaces. Remove them
+from your terraform state before running the destroy command.
+
+```bash
+terraform destroy
+```
+
+To remove resources from your Terraform state, run
+
+```bash
+# grafana workspace
+terraform state rm "module.eks_observability_accelerator.module.managed_grafana[0].aws_grafana_workspace.this[0]"
+
+# prometheus workspace
+terraform state rm "module.eks_observability_accelerator.aws_prometheus_workspace.this[0]"
+```
+
+> **Note:** To view all the features proposed by this module, visit the [module documentation](https://github.com/aws-observability/terraform-aws-observability-accelerator/tree/main/modules/workloads/infra).
diff --git a/docs/eks/index.md b/docs/eks/index.md
@@ -1,21 +1,25 @@
-# Amazon EKS cluster monitoring
+# Amazon EKS cluster metrics
 
 This example demonstrates how to monitor your Amazon Elastic Kubernetes Service
 (Amazon EKS) cluster with the Observability Accelerator's EKS
 [infrastructure module](https://github.com/aws-observability/terraform-aws-observability-accelerator/tree/main/modules/workloads/infra).
 
-Monitoring Amazon Elastic Kubernetes Service (Amazon EKS) has two categories:
+Monitoring Amazon Elastic Kubernetes Service (Amazon EKS) for metrics has two categories:
 the control plane and the Amazon EKS nodes (with Kubernetes objects).
 The Amazon EKS control plane consists of control plane nodes that run the Kubernetes software,
 such as etcd and the Kubernetes API server. To read more on the components of an Amazon EKS cluster,
 please read the [service documentation](https://docs.aws.amazon.com/eks/latest/userguide/clusters.html).
 
 The Amazon EKS infrastructure Terraform modules focuses on metrics collection to Amazon
-Managed Service for Prometheus using the [AWS Distro for OpenTelemetry Operator](https://docs.aws.amazon.com/eks/latest/userguide/opentelemetry.html) for Amazon EKS.
-Additionally, it provides default dashboards to get a comprehensible visibility on the nodes,
+Managed Service for Prometheus using the [AWS Distro for OpenTelemetry Operator](https://docs.aws.amazon.com/eks/latest/userguide/opentelemetry.html) for Amazon EKS. It deploys the [node exporter](https://github.com/prometheus/node_exporter) and [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics) in your cluster.
+
+It provides default dashboards to get a comprehensible visibility on your nodes,
 namespaces, pods, and kubelet operations health. Finally, you get curated Prometheus recording rules
 and alerts to operate your cluster.
 
+Additionally, you can optionally collect additional custom Prometheus metrics from your applications running
+on your EKS cluster.
+
 ## Prerequisites
 
 Make sure to complete the [prerequisites section](https://aws-observability.github.io/terraform-aws-observability-accelerator/concepts/#prerequisites)
@@ -132,28 +136,35 @@ Open the Amazon Managed Service for Prometheus console and view the details of y
 To setup your alert receiver, with Amazon SNS, follow [this documentation](https://docs.aws.amazon.com/prometheus/latest/userguide/AMP-alertmanager-receiver.html)
 
 
-## Destroy resources
+## Custom metrics collection
 
-If you leave this stack running, you will continue to incur charges. To remove all resources
-created by Terraform, [refresh your Grafana API key](#6-grafana-api-key) and run the command below.
+In addition to the cluster metrics, if you are interested in collecting Prometheus
+metrics from your pods, you can use setup `custom metrics collection`.
+This will instruct the ADOT collector to scrape your applications metrics based
+on the configuration you provide. You can also exclude some of the metrics and save costs.
 
-Be careful, this command will removing everything created by Terraform. If you wish
-to keep your Amazon Managed Grafana or Amazon Managed Service for Prometheus workspaces. Remove them
-from your terraform state before running the destroy command.
-
-```bash
-terraform destroy
-```
+Using the example, you can edit `examples/existing-cluster-with-base-and-infra/main.tf`.
+In the module `module "workloads_infra" {` add the following config (make sure the values matches your use case):
 
-To remove resources from your Terraform state, run
+```hcl
+enable_custom_metrics = true
 
-```bash
-# grafana workspace
-terraform state rm "module.eks_observability_accelerator.module.managed_grafana[0].aws_grafana_workspace.this[0]"
+custom_metrics_config = {
+    # list of applications ports (example)
+    ports = [8000, 8080]
 
-# prometheus workspace
-terraform state rm "module.eks_observability_accelerator.aws_prometheus_workspace.this[0]"
+    # list of series prefixes you want to discard from ingestion
+    dropped_series_prefix = ["go_gcc"]
+}
 ```
 
+After applying Terraform, on Grafana, you can query Prometheus for your application metrics,
+create alerts and build on your own dashboards. On the explorer section of Grafana, the
+following query will give you the containers exposing metrics that matched the custom metrics
+collection, grouped by cluster and node.
+
+```promql
+sum(up{job="custom-metrics"}) by (container_name, cluster, nodename)
+```
 
-> **Note:** To view all the features proposed by this module, visit the [module documentation](https://github.com/aws-observability/terraform-aws-observability-accelerator/tree/main/modules/workloads/infra).
+<img width="2560" alt="Screenshot 2023-01-31 at 11 16 21" src="https://user-images.githubusercontent.com/10175027/215869004-e05f557d-c81a-41fb-a452-ede9f986cb27.png">
diff --git a/docs/index.md b/docs/index.md
@@ -3,18 +3,19 @@
 Welcome to the AWS Observability Accelerator for Terraform!
 
 The AWS Observability accelerator is a set of Terraform modules to help you
-configure Observability for your workloads and environemnts  with AWS
+configure Observability for your container workloads and environemnts with AWS
 Observability services. This project proposes a core module to bootstrap
-your cluster with the AWS Distro for OpenTelemetry (ADOT) Operator for EKS,
+your Amazon EKS cluster with the AWS Distro for OpenTelemetry (ADOT) Operator for EKS,
 Amazon Managed Service for Prometheus, Amazon Managed Grafana.
+
 Additionally we have a set of workload modules to leverage curated ADOT
 collector configurations, Grafana dashboards, Prometheus recording rules and alerts.
 
 <img width="1501" alt="image" src="https://user-images.githubusercontent.com/10175027/193913383-94aaf4e2-58c6-4779-935b-e40528e86c03.png">
 
 ## Getting started
 
-This project provides a set of Terraform modules to enable metrics collection,
+This project provides a set of Terraform modules to enable metrics and traces collection,
 dashboards and alerts for monitoring:
 
 - Amazon EKS clusters infrastructure
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -25,7 +25,9 @@ theme:
 nav:
   - Home: index.md
   - Concepts: concepts.md
-  - Amazon EKS Cluster Monitoring: eks.md
+  - Amazon EKS:
+      - Infrastructure monitoring: eks/index.md
+      - Teardown: eks/destroy.md
   - Workload Monitoring:
       - Java/JMX: workloads/java.md
       - Nginx: workloads/nginx.md