Skip to content

Commit 6ca939d

Browse files
authored
Adot health monitoring (#212)
* Added adot health metric collection * corrected alignment related error in Opentelemetry config * reverted the links * pre-commit * Update dashboards.tf split adothealth dashboard * split dashboards * changed dashboard link * corrected flux_kustomization_path * Pre-commit
1 parent 36751a4 commit 6ca939d

File tree

7 files changed

+94
-0
lines changed

7 files changed

+94
-0
lines changed

modules/eks-monitoring/README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,7 @@ See examples using this Terraform modules in the **Amazon EKS** section of [this
5555
| [helm_release.grafana_operator](https://registry.terraform.io/providers/hashicorp/helm/latest/docs/resources/release) | resource |
5656
| [helm_release.kube_state_metrics](https://registry.terraform.io/providers/hashicorp/helm/latest/docs/resources/release) | resource |
5757
| [helm_release.prometheus_node_exporter](https://registry.terraform.io/providers/hashicorp/helm/latest/docs/resources/release) | resource |
58+
| [kubectl_manifest.adothealth_monitoring_dashboards](https://registry.terraform.io/providers/gavinbunney/kubectl/latest/docs/resources/manifest) | resource |
5859
| [kubectl_manifest.api_server_dashboards](https://registry.terraform.io/providers/gavinbunney/kubectl/latest/docs/resources/manifest) | resource |
5960
| [kubectl_manifest.flux_gitrepository](https://registry.terraform.io/providers/gavinbunney/kubectl/latest/docs/resources/manifest) | resource |
6061
| [kubectl_manifest.flux_kustomization](https://registry.terraform.io/providers/gavinbunney/kubectl/latest/docs/resources/manifest) | resource |
@@ -68,9 +69,11 @@ See examples using this Terraform modules in the **Amazon EKS** section of [this
6869
| Name | Description | Type | Default | Required |
6970
|------|-------------|------|---------|:--------:|
7071
| <a name="input_adot_loglevel"></a> [adot\_loglevel](#input\_adot\_loglevel) | Verbosity level for ADOT collector logs. This accepts (detailed\|normal\|basic), see https://aws-otel.github.io/docs/components/misc-exporters for mor infos. | `string` | `"normal"` | no |
72+
| <a name="input_adothealth_monitoring_config"></a> [adothealth\_monitoring\_config](#input\_adothealth\_monitoring\_config) | Config object for API server monitoring | <pre>object({<br> flux_gitrepository_name = string<br> flux_gitrepository_url = string<br> flux_gitrepository_branch = string<br> flux_kustomization_name = string<br> flux_kustomization_path = string<br><br> dashboards = object({<br> grafana_adothealth_dashboard_url = string<br> })<br> })</pre> | `null` | no |
7173
| <a name="input_apiserver_monitoring_config"></a> [apiserver\_monitoring\_config](#input\_apiserver\_monitoring\_config) | Config object for API server monitoring | <pre>object({<br> flux_gitrepository_name = string<br> flux_gitrepository_url = string<br> flux_gitrepository_branch = string<br> flux_kustomization_name = string<br> flux_kustomization_path = string<br><br> dashboards = object({<br> basic = string<br> advanced = string<br> troubleshooting = string<br> })<br> })</pre> | `null` | no |
7274
| <a name="input_custom_metrics_config"></a> [custom\_metrics\_config](#input\_custom\_metrics\_config) | Configuration object to enable custom metrics collection | <pre>map(object({<br> enableBasicAuth = bool<br> path = string<br> basicAuthUsername = string<br> basicAuthPassword = string<br> ports = string<br> droppedSeriesPrefixes = string<br> }))</pre> | `null` | no |
7375
| <a name="input_eks_cluster_id"></a> [eks\_cluster\_id](#input\_eks\_cluster\_id) | EKS Cluster Id | `string` | n/a | yes |
76+
| <a name="input_enable_adotcollector_metrics"></a> [enable\_adotcollector\_metrics](#input\_enable\_adotcollector\_metrics) | Enables collection of ADOT collector metrics | `bool` | `true` | no |
7477
| <a name="input_enable_alerting_rules"></a> [enable\_alerting\_rules](#input\_enable\_alerting\_rules) | Enables or disables Managed Prometheus alerting rules | `bool` | `true` | no |
7578
| <a name="input_enable_amazon_eks_adot"></a> [enable\_amazon\_eks\_adot](#input\_enable\_amazon\_eks\_adot) | Enables the ADOT Operator on the EKS Cluster | `bool` | `true` | no |
7679
| <a name="input_enable_apiserver_monitoring"></a> [enable\_apiserver\_monitoring](#input\_enable\_apiserver\_monitoring) | Enable EKS kube-apiserver monitoring, alerting and dashboards | `bool` | `true` | no |

modules/eks-monitoring/dashboards.tf

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,3 +74,29 @@ YAML
7474
count = var.enable_apiserver_monitoring ? 1 : 0
7575
depends_on = [module.external_secrets]
7676
}
77+
78+
# adot health dashboards
79+
resource "kubectl_manifest" "adothealth_monitoring_dashboards" {
80+
yaml_body = <<YAML
81+
apiVersion: kustomize.toolkit.fluxcd.io/v1beta2
82+
kind: Kustomization
83+
metadata:
84+
name: ${local.adothealth_monitoring_config.flux_kustomization_name}
85+
namespace: flux-system
86+
spec:
87+
interval: 1m0s
88+
path: ${local.adothealth_monitoring_config.flux_kustomization_path}
89+
prune: true
90+
sourceRef:
91+
kind: GitRepository
92+
name: ${local.adothealth_monitoring_config.flux_gitrepository_name}
93+
postBuild:
94+
substitute:
95+
AMG_AWS_REGION: ${var.managed_prometheus_workspace_region}
96+
AMP_ENDPOINT_URL: ${var.managed_prometheus_workspace_endpoint}
97+
AMG_ENDPOINT_URL: ${var.grafana_url}
98+
GRAFANA_ADOTHEALTH_DASH_URL: ${local.adothealth_monitoring_config.dashboards.grafana_adothealth_dashboard_url}
99+
YAML
100+
count = var.enable_adotcollector_metrics ? 1 : 0
101+
depends_on = [module.external_secrets]
102+
}

modules/eks-monitoring/locals.tf

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -119,4 +119,17 @@ locals {
119119
troubleshooting = "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/apiserver/apiserver-troubleshooting.json"
120120
}
121121
}
122+
123+
adothealth_monitoring_config = {
124+
# can be overriden by providing a config
125+
flux_gitrepository_name = "aws-observability-accelerator"
126+
flux_gitrepository_url = "https://github.com/aws-observability/aws-observability-accelerator"
127+
flux_gitrepository_branch = "main"
128+
flux_kustomization_name = "grafana-dashboards-adothealth"
129+
flux_kustomization_path = "./artifacts/grafana-operator-manifests/eks/adot"
130+
131+
dashboards = {
132+
grafana_adothealth_dashboard_url = "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/adot/adothealth.json"
133+
}
134+
}
122135
}

modules/eks-monitoring/main.tf

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -180,7 +180,12 @@ module "helm_addon" {
180180
{
181181
name = "istioPrometheusMetricsEndpoint"
182182
value = try(var.istio_config.prometheus_metrics_endpoint, local.istio_pattern_config.prometheus_metrics_endpoint)
183+
},
184+
{
185+
name = "enableAdotcollectorMetrics"
186+
value = var.enable_adotcollector_metrics
183187
}
188+
184189
]
185190

186191
irsa_config = {

modules/eks-monitoring/otel-config/templates/opentelemetrycollector.yaml

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,15 @@ spec:
2626
endpoint: {{ .Values.otlpGrpcEndpoint }}
2727
http:
2828
endpoint: {{ .Values.otlpHttpEndpoint }}
29+
{{ if .Values.enableAdotcollectorMetrics }}
30+
prometheus/1:
31+
config:
32+
scrape_configs:
33+
- job_name: otel-collector-metrics
34+
scrape_interval: 10s
35+
static_configs:
36+
- targets: ['localhost:8888']
37+
{{ end }}
2938
prometheus:
3039
config:
3140
global:
@@ -1607,9 +1616,21 @@ spec:
16071616
receivers: [prometheus, otlp]
16081617
processors: [batch/metrics, attributes/metrics]
16091618
exporters: [logging, prometheusremotewrite]
1619+
{{ if .Values.enableAdotcollectorMetrics }}
1620+
metrics/1:
1621+
receivers: [prometheus/1]
1622+
processors: []
1623+
exporters: [prometheusremotewrite]
1624+
{{ end }}
16101625
{{ if .Values.enableTracing }}
16111626
traces:
16121627
receivers: [otlp]
16131628
processors: [batch/traces]
16141629
exporters: [logging, awsxray]
16151630
{{ end }}
1631+
{{ if .Values.enableAdotcollectorMetrics }}
1632+
telemetry:
1633+
metrics:
1634+
address: 0.0.0.0:8888
1635+
level: basic
1636+
{{ end }}

modules/eks-monitoring/otel-config/values.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,3 +30,5 @@ istioScrapeSampleLimit: ${istio_scrape_sample_limit}
3030
istioPrometheusMetricsEndpoint: ${istio_prometheus_metrics_endpoint}
3131

3232
adotLoglevel: ${adot_loglevel}
33+
34+
enableAdotcollectorMetrics: ${enable_adotcollector_metrics}

modules/eks-monitoring/variables.tf

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -507,3 +507,27 @@ variable "target_secret_namespace" {
507507
type = string
508508
default = "grafana-operator"
509509
}
510+
511+
variable "enable_adotcollector_metrics" {
512+
description = "Enables collection of ADOT collector metrics"
513+
type = bool
514+
default = true
515+
}
516+
517+
variable "adothealth_monitoring_config" {
518+
description = "Config object for API server monitoring"
519+
type = object({
520+
flux_gitrepository_name = string
521+
flux_gitrepository_url = string
522+
flux_gitrepository_branch = string
523+
flux_kustomization_name = string
524+
flux_kustomization_path = string
525+
526+
dashboards = object({
527+
grafana_adothealth_dashboard_url = string
528+
})
529+
})
530+
531+
# defaults are pre-computed in locals.tf, provide a full definition to override
532+
default = null
533+
}

0 commit comments

Comments
 (0)