Skip to content

Commit 86cc948

Browse files
authored
fix(eks-monitoring): kube-proxy scrape config and alert rule (#221)
* changes to just OTEL kube-proxy scrape job * agree with koffir@ to implement kube-proxy by default * re-added KubeProxyDown Prometheus alert rule * added kube-proxy URL related stuff * corrected kube-proxy stuff according to new structure * Use SEMVER for flux now * Run pre-commit * Bump OTEL chart version * Substitute URL path with SEMVER * Run pre-commit * Added kube-proxy config object * Run pre-commit * Use kube-proxy config object for dashboards --------- Co-authored-by: Jens-Uwe Walther <[email protected]>
1 parent 72e0d87 commit 86cc948

File tree

7 files changed

+120
-137
lines changed

7 files changed

+120
-137
lines changed

modules/eks-monitoring/README.md

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,7 @@ See examples using this Terraform modules in the **Amazon EKS** section of [this
5959
| [kubectl_manifest.api_server_dashboards](https://registry.terraform.io/providers/gavinbunney/kubectl/latest/docs/resources/manifest) | resource |
6060
| [kubectl_manifest.flux_gitrepository](https://registry.terraform.io/providers/gavinbunney/kubectl/latest/docs/resources/manifest) | resource |
6161
| [kubectl_manifest.flux_kustomization](https://registry.terraform.io/providers/gavinbunney/kubectl/latest/docs/resources/manifest) | resource |
62+
| [kubectl_manifest.kubeproxy_monitoring_dashboard](https://registry.terraform.io/providers/gavinbunney/kubectl/latest/docs/resources/manifest) | resource |
6263
| [aws_caller_identity.current](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/caller_identity) | data source |
6364
| [aws_eks_cluster.eks_cluster](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/eks_cluster) | data source |
6465
| [aws_partition.current](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/partition) | data source |
@@ -69,7 +70,7 @@ See examples using this Terraform modules in the **Amazon EKS** section of [this
6970
| Name | Description | Type | Default | Required |
7071
|------|-------------|------|---------|:--------:|
7172
| <a name="input_adot_loglevel"></a> [adot\_loglevel](#input\_adot\_loglevel) | Verbosity level for ADOT collector logs. This accepts (detailed\|normal\|basic), see https://aws-otel.github.io/docs/components/misc-exporters for mor infos. | `string` | `"normal"` | no |
72-
| <a name="input_adothealth_monitoring_config"></a> [adothealth\_monitoring\_config](#input\_adothealth\_monitoring\_config) | Config object for API server monitoring | <pre>object({<br> flux_gitrepository_name = string<br> flux_gitrepository_url = string<br> flux_gitrepository_branch = string<br> flux_kustomization_name = string<br> flux_kustomization_path = string<br><br> dashboards = object({<br> grafana_adothealth_dashboard_url = string<br> })<br> })</pre> | `null` | no |
73+
| <a name="input_adothealth_monitoring_config"></a> [adothealth\_monitoring\_config](#input\_adothealth\_monitoring\_config) | Config object for ADOT health monitoring | <pre>object({<br> flux_gitrepository_name = string<br> flux_gitrepository_url = string<br> flux_gitrepository_branch = string<br> flux_kustomization_name = string<br> flux_kustomization_path = string<br><br> dashboards = object({<br> grafana_adothealth_dashboard_url = string<br> })<br> })</pre> | `null` | no |
7374
| <a name="input_apiserver_monitoring_config"></a> [apiserver\_monitoring\_config](#input\_apiserver\_monitoring\_config) | Config object for API server monitoring | <pre>object({<br> flux_gitrepository_name = string<br> flux_gitrepository_url = string<br> flux_gitrepository_branch = string<br> flux_kustomization_name = string<br> flux_kustomization_path = string<br><br> dashboards = object({<br> basic = string<br> advanced = string<br> troubleshooting = string<br> })<br> })</pre> | `null` | no |
7475
| <a name="input_custom_metrics_config"></a> [custom\_metrics\_config](#input\_custom\_metrics\_config) | Configuration object to enable custom metrics collection | <pre>map(object({<br> enableBasicAuth = bool<br> path = string<br> basicAuthUsername = string<br> basicAuthPassword = string<br> ports = string<br> droppedSeriesPrefixes = string<br> }))</pre> | `null` | no |
7576
| <a name="input_eks_cluster_id"></a> [eks\_cluster\_id](#input\_eks\_cluster\_id) | EKS Cluster Id | `string` | n/a | yes |
@@ -92,26 +93,28 @@ See examples using this Terraform modules in the **Amazon EKS** section of [this
9293
| <a name="input_enable_recording_rules"></a> [enable\_recording\_rules](#input\_enable\_recording\_rules) | Enables or disables Managed Prometheus recording rules | `bool` | `true` | no |
9394
| <a name="input_enable_tracing"></a> [enable\_tracing](#input\_enable\_tracing) | Enables tracing with OTLP traces receiver to X-Ray | `bool` | `true` | no |
9495
| <a name="input_flux_config"></a> [flux\_config](#input\_flux\_config) | FluxCD configuration | <pre>object({<br> create_namespace = bool<br> k8s_namespace = string<br> helm_chart_name = string<br> helm_chart_version = string<br> helm_release_name = string<br> helm_repo_url = string<br> helm_settings = map(string)<br> helm_values = map(any)<br> })</pre> | <pre>{<br> "create_namespace": true,<br> "helm_chart_name": "flux2",<br> "helm_chart_version": "2.7.0",<br> "helm_release_name": "observability-fluxcd-addon",<br> "helm_repo_url": "https://fluxcd-community.github.io/helm-charts",<br> "helm_settings": {},<br> "helm_values": {},<br> "k8s_namespace": "flux-system"<br>}</pre> | no |
95-
| <a name="input_flux_gitrepository_branch"></a> [flux\_gitrepository\_branch](#input\_flux\_gitrepository\_branch) | Flux GitRepository Branch | `string` | `"main"` | no |
96+
| <a name="input_flux_gitrepository_branch"></a> [flux\_gitrepository\_branch](#input\_flux\_gitrepository\_branch) | Flux GitRepository Branch | `string` | `"v0.2.0"` | no |
9697
| <a name="input_flux_gitrepository_name"></a> [flux\_gitrepository\_name](#input\_flux\_gitrepository\_name) | Flux GitRepository name | `string` | `"aws-observability-accelerator"` | no |
9798
| <a name="input_flux_gitrepository_url"></a> [flux\_gitrepository\_url](#input\_flux\_gitrepository\_url) | Flux GitRepository URL | `string` | `"https://github.com/aws-observability/aws-observability-accelerator"` | no |
9899
| <a name="input_flux_kustomization_name"></a> [flux\_kustomization\_name](#input\_flux\_kustomization\_name) | Flux Kustomization name | `string` | `"grafana-dashboards-infrastructure"` | no |
99100
| <a name="input_flux_kustomization_path"></a> [flux\_kustomization\_path](#input\_flux\_kustomization\_path) | Flux Kustomization Path | `string` | `"./artifacts/grafana-operator-manifests/eks/infrastructure"` | no |
100101
| <a name="input_go_config"></a> [go\_config](#input\_go\_config) | Grafana Operator configuration | <pre>object({<br> create_namespace = bool<br> helm_chart = string<br> helm_name = string<br> k8s_namespace = string<br> helm_release_name = string<br> helm_chart_version = string<br> })</pre> | <pre>{<br> "create_namespace": true,<br> "helm_chart": "oci://ghcr.io/grafana-operator/helm-charts/grafana-operator",<br> "helm_chart_version": "v5.0.0-rc3",<br> "helm_name": "grafana-operator",<br> "helm_release_name": "grafana-operator",<br> "k8s_namespace": "grafana-operator"<br>}</pre> | no |
101102
| <a name="input_grafana_api_key"></a> [grafana\_api\_key](#input\_grafana\_api\_key) | Grafana API key for the Amazon Managed Grafana workspace. Required if `enable_external_secrets = true` | `string` | `""` | no |
102-
| <a name="input_grafana_cluster_dashboard_url"></a> [grafana\_cluster\_dashboard\_url](#input\_grafana\_cluster\_dashboard\_url) | Dashboard URL for Cluster Grafana Dashboard JSON | `string` | `"https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/cluster.json"` | no |
103-
| <a name="input_grafana_kubelet_dashboard_url"></a> [grafana\_kubelet\_dashboard\_url](#input\_grafana\_kubelet\_dashboard\_url) | Dashboard URL for Kubelet Grafana Dashboard JSON | `string` | `"https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/kubelet.json"` | no |
104-
| <a name="input_grafana_namespace_workloads_dashboard_url"></a> [grafana\_namespace\_workloads\_dashboard\_url](#input\_grafana\_namespace\_workloads\_dashboard\_url) | Dashboard URL for Namespace Workloads Grafana Dashboard JSON | `string` | `"https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/namespace-workloads.json"` | no |
105-
| <a name="input_grafana_node_exporter_dashboard_url"></a> [grafana\_node\_exporter\_dashboard\_url](#input\_grafana\_node\_exporter\_dashboard\_url) | Dashboard URL for Node Exporter Grafana Dashboard JSON | `string` | `"https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/nodeexporter-nodes.json"` | no |
106-
| <a name="input_grafana_nodes_dashboard_url"></a> [grafana\_nodes\_dashboard\_url](#input\_grafana\_nodes\_dashboard\_url) | Dashboard URL for Nodes Grafana Dashboard JSON | `string` | `"https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/nodes.json"` | no |
103+
| <a name="input_grafana_cluster_dashboard_url"></a> [grafana\_cluster\_dashboard\_url](#input\_grafana\_cluster\_dashboard\_url) | Dashboard URL for Cluster Grafana Dashboard JSON | `string` | `"https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/v0.2.0/artifacts/grafana-dashboards/eks/infrastructure/cluster.json"` | no |
104+
| <a name="input_grafana_kubelet_dashboard_url"></a> [grafana\_kubelet\_dashboard\_url](#input\_grafana\_kubelet\_dashboard\_url) | Dashboard URL for Kubelet Grafana Dashboard JSON | `string` | `"https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/v0.2.0/artifacts/grafana-dashboards/eks/infrastructure/kubelet.json"` | no |
105+
| <a name="input_grafana_kubeproxy_dashboard_url"></a> [grafana\_kubeproxy\_dashboard\_url](#input\_grafana\_kubeproxy\_dashboard\_url) | Dashboard URL for kube-proxy Grafana Dashboard JSON | `string` | `"https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/v0.2.0/artifacts/grafana-dashboards/eks/kube-proxy/kube-proxy.json"` | no |
106+
| <a name="input_grafana_namespace_workloads_dashboard_url"></a> [grafana\_namespace\_workloads\_dashboard\_url](#input\_grafana\_namespace\_workloads\_dashboard\_url) | Dashboard URL for Namespace Workloads Grafana Dashboard JSON | `string` | `"https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/v0.2.0/artifacts/grafana-dashboards/eks/infrastructure/namespace-workloads.json"` | no |
107+
| <a name="input_grafana_node_exporter_dashboard_url"></a> [grafana\_node\_exporter\_dashboard\_url](#input\_grafana\_node\_exporter\_dashboard\_url) | Dashboard URL for Node Exporter Grafana Dashboard JSON | `string` | `"https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/v0.2.0/artifacts/grafana-dashboards/eks/infrastructure/nodeexporter-nodes.json"` | no |
108+
| <a name="input_grafana_nodes_dashboard_url"></a> [grafana\_nodes\_dashboard\_url](#input\_grafana\_nodes\_dashboard\_url) | Dashboard URL for Nodes Grafana Dashboard JSON | `string` | `"https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/v0.2.0/artifacts/grafana-dashboards/eks/infrastructure/nodes.json"` | no |
107109
| <a name="input_grafana_url"></a> [grafana\_url](#input\_grafana\_url) | Endpoint URL of Amazon Managed Grafana workspace. Required if `enable_grafana_operator = true` | `string` | `""` | no |
108-
| <a name="input_grafana_workloads_dashboard_url"></a> [grafana\_workloads\_dashboard\_url](#input\_grafana\_workloads\_dashboard\_url) | Dashboard URL for Workloads Grafana Dashboard JSON | `string` | `"https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/workloads.json"` | no |
110+
| <a name="input_grafana_workloads_dashboard_url"></a> [grafana\_workloads\_dashboard\_url](#input\_grafana\_workloads\_dashboard\_url) | Dashboard URL for Workloads Grafana Dashboard JSON | `string` | `"https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/v0.2.0/artifacts/grafana-dashboards/eks/infrastructure/workloads.json"` | no |
109111
| <a name="input_helm_config"></a> [helm\_config](#input\_helm\_config) | Helm Config for Prometheus | `any` | `{}` | no |
110112
| <a name="input_irsa_iam_permissions_boundary"></a> [irsa\_iam\_permissions\_boundary](#input\_irsa\_iam\_permissions\_boundary) | IAM permissions boundary for IRSA roles | `string` | `null` | no |
111113
| <a name="input_irsa_iam_role_path"></a> [irsa\_iam\_role\_path](#input\_irsa\_iam\_role\_path) | IAM role path for IRSA roles | `string` | `"/"` | no |
112114
| <a name="input_istio_config"></a> [istio\_config](#input\_istio\_config) | Configuration object for ISTIO monitoring | <pre>object({<br> enable_alerting_rules = bool<br> enable_recording_rules = bool<br> enable_dashboards = bool<br> scrape_sample_limit = number<br><br> flux_gitrepository_name = string<br> flux_gitrepository_url = string<br> flux_gitrepository_branch = string<br> flux_kustomization_name = string<br> flux_kustomization_path = string<br><br> managed_prometheus_workspace_id = string<br> prometheus_metrics_endpoint = string<br><br> dashboards = object({<br> cp = string<br> mesh = string<br> performance = string<br> service = string<br> })<br> })</pre> | `null` | no |
113115
| <a name="input_java_config"></a> [java\_config](#input\_java\_config) | Configuration object for Java/JMX monitoring | <pre>object({<br> enable_alerting_rules = bool<br> enable_recording_rules = bool<br> enable_dashboards = bool<br> scrape_sample_limit = number<br><br><br> flux_gitrepository_name = string<br> flux_gitrepository_url = string<br> flux_gitrepository_branch = string<br> flux_kustomization_name = string<br> flux_kustomization_path = string<br><br> grafana_dashboard_url = string<br><br> prometheus_metrics_endpoint = string<br> })</pre> | `null` | no |
114116
| <a name="input_ksm_config"></a> [ksm\_config](#input\_ksm\_config) | Kube State metrics configuration | <pre>object({<br> create_namespace = bool<br> k8s_namespace = string<br> helm_chart_name = string<br> helm_chart_version = string<br> helm_release_name = string<br> helm_repo_url = string<br> helm_settings = map(string)<br> helm_values = map(any)<br><br> scrape_interval = string<br> scrape_timeout = string<br> })</pre> | <pre>{<br> "create_namespace": true,<br> "helm_chart_name": "kube-state-metrics",<br> "helm_chart_version": "4.24.0",<br> "helm_release_name": "kube-state-metrics",<br> "helm_repo_url": "https://prometheus-community.github.io/helm-charts",<br> "helm_settings": {},<br> "helm_values": {},<br> "k8s_namespace": "kube-system",<br> "scrape_interval": "60s",<br> "scrape_timeout": "15s"<br>}</pre> | no |
117+
| <a name="input_kubeproxy_monitoring_config"></a> [kubeproxy\_monitoring\_config](#input\_kubeproxy\_monitoring\_config) | Config object for kube-proxy monitoring | <pre>object({<br> flux_gitrepository_name = string<br> flux_gitrepository_url = string<br> flux_gitrepository_branch = string<br> flux_kustomization_name = string<br> flux_kustomization_path = string<br><br> dashboards = object({<br> grafana_kubeproxy_dashboard_url = string<br> })<br> })</pre> | `null` | no |
115118
| <a name="input_logs_config"></a> [logs\_config](#input\_logs\_config) | Configuration object for logs collection | <pre>object({<br> cw_log_retention_days = number<br> })</pre> | <pre>{<br> "cw_log_retention_days": 90<br>}</pre> | no |
116119
| <a name="input_managed_prometheus_workspace_endpoint"></a> [managed\_prometheus\_workspace\_endpoint](#input\_managed\_prometheus\_workspace\_endpoint) | Amazon Managed Prometheus Workspace Endpoint | `string` | `""` | no |
117120
| <a name="input_managed_prometheus_workspace_id"></a> [managed\_prometheus\_workspace\_id](#input\_managed\_prometheus\_workspace\_id) | Amazon Managed Prometheus Workspace ID | `string` | `null` | no |

modules/eks-monitoring/alerts.tf

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -262,6 +262,14 @@ groups:
262262
annotations:
263263
description: Kubelet has disappeared from Prometheus target discovery.
264264
summary: Target disappeared from Prometheus target discovery.
265+
- alert: KubeProxyDown
266+
expr: absent(up{job="kube-proxy"} == 1)
267+
for: 15m
268+
labels:
269+
severity: critical
270+
annotations:
271+
description: KubeProxy has disappeared from Prometheus target discovery.
272+
summary: Target disappeared from Prometheus target discovery.
265273
- alert: KubeVersionMismatch
266274
expr: count by(cluster) (count by(git_version, cluster) (label_replace(kubernetes_build_info{job!~"kube-dns|coredns"}, "git_version", "$1", "git_version", "(v[0-9]*.[0-9]*).*"))) > 1
267275
for: 15m

modules/eks-monitoring/dashboards.tf

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -94,3 +94,25 @@ YAML
9494
count = var.enable_adotcollector_metrics ? 1 : 0
9595
depends_on = [module.external_secrets]
9696
}
97+
98+
resource "kubectl_manifest" "kubeproxy_monitoring_dashboard" {
99+
yaml_body = <<YAML
100+
apiVersion: kustomize.toolkit.fluxcd.io/v1beta2
101+
kind: Kustomization
102+
metadata:
103+
name: ${local.kubeproxy_monitoring_config.flux_kustomization_name}
104+
namespace: flux-system
105+
spec:
106+
interval: 1m0s
107+
path: ${local.kubeproxy_monitoring_config.flux_kustomization_path}
108+
prune: true
109+
sourceRef:
110+
kind: GitRepository
111+
name: ${local.kubeproxy_monitoring_config.flux_gitrepository_name}
112+
postBuild:
113+
substitute:
114+
GRAFANA_KUBEPROXY_DASH_URL: ${local.kubeproxy_monitoring_config.dashboards.default}
115+
YAML
116+
count = var.enable_dashboards ? 1 : 0
117+
depends_on = [module.external_secrets]
118+
}

0 commit comments

Comments
 (0)