Skip to content

Commit c726f50

Browse files
authored
Tracing support with custom metrics collection (#67)
* Init tracing support * Update xray exporter * Remove duplicate metrics only on daemon set * Add support for custom metrics collection * Remove experimental feature * Rename config items * Update requirements for tracing support
1 parent e73b4b9 commit c726f50

File tree

6 files changed

+236
-42
lines changed

6 files changed

+236
-42
lines changed

.github/workflows/docbuild.yml

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
1-
name: ci
1+
name: ci
22
on:
33
push:
44
branches:
5-
- master
5+
- master
66
- main
77
permissions:
88
contents: write
@@ -14,6 +14,5 @@ jobs:
1414
- uses: actions/setup-python@v4
1515
with:
1616
python-version: 3.x
17-
- run: pip install mkdocs-material
17+
- run: pip install mkdocs-material
1818
- run: mkdocs gh-deploy --force
19-

modules/workloads/infra/README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,13 +57,16 @@ This module is inspired from the open source [kube-prometheus-stack](https://git
5757

5858
| Name | Description | Type | Default | Required |
5959
|------|-------------|------|---------|:--------:|
60+
| <a name="input_custom_metrics_config"></a> [custom\_metrics\_config](#input\_custom\_metrics\_config) | Configuration object to enable custom metrics collection | <pre>object({<br> ports = list(number)<br> # paths = optional(list(string), ["/metrics"])<br> # list of samples to be dropped by label prefix, ex: go_ -> discards go_.*<br> dropped_series_prefixes = list(string)<br> })</pre> | <pre>{<br> "dropped_series_prefixes": [<br> "unspecified"<br> ],<br> "ports": []<br>}</pre> | no |
6061
| <a name="input_dashboards_folder_id"></a> [dashboards\_folder\_id](#input\_dashboards\_folder\_id) | Grafana folder ID for automatic dashboards | `string` | n/a | yes |
6162
| <a name="input_eks_cluster_id"></a> [eks\_cluster\_id](#input\_eks\_cluster\_id) | EKS Cluster Id | `string` | n/a | yes |
6263
| <a name="input_enable_alerting_rules"></a> [enable\_alerting\_rules](#input\_enable\_alerting\_rules) | Enables or disables Managed Prometheus alerting rules | `bool` | `true` | no |
64+
| <a name="input_enable_custom_metrics"></a> [enable\_custom\_metrics](#input\_enable\_custom\_metrics) | Allows additional metrics collection for config elements in the `custom_metrics_config` config object. Automatic dashboards are not included | `bool` | `false` | no |
6365
| <a name="input_enable_dashboards"></a> [enable\_dashboards](#input\_enable\_dashboards) | Enables or disables curated dashboards | `bool` | `true` | no |
6466
| <a name="input_enable_kube_state_metrics"></a> [enable\_kube\_state\_metrics](#input\_enable\_kube\_state\_metrics) | Enables or disables Kube State metrics exporter. Disabling this might affect some data in the dashboards | `bool` | `true` | no |
6567
| <a name="input_enable_node_exporter"></a> [enable\_node\_exporter](#input\_enable\_node\_exporter) | Enables or disables Node exporter. Disabling this might affect some data in the dashboards | `bool` | `true` | no |
6668
| <a name="input_enable_recording_rules"></a> [enable\_recording\_rules](#input\_enable\_recording\_rules) | Enables or disables Managed Prometheus recording rules. Disabling this might affect some data in the dashboards | `bool` | `true` | no |
69+
| <a name="input_enable_tracing"></a> [enable\_tracing](#input\_enable\_tracing) | (Experimental) Enables tracing with AWS X-Ray. This changes the deploy mode of the collector to daemon set. Requirement: adot add-on <= 0.58-build.0 | `bool` | `false` | no |
6770
| <a name="input_helm_config"></a> [helm\_config](#input\_helm\_config) | Helm Config for Prometheus | `any` | `{}` | no |
6871
| <a name="input_irsa_iam_permissions_boundary"></a> [irsa\_iam\_permissions\_boundary](#input\_irsa\_iam\_permissions\_boundary) | IAM permissions boundary for IRSA roles | `string` | `""` | no |
6972
| <a name="input_irsa_iam_role_path"></a> [irsa\_iam\_role\_path](#input\_irsa\_iam\_role\_path) | IAM role path for IRSA roles | `string` | `"/"` | no |

modules/workloads/infra/main.tf

Lines changed: 28 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,14 +73,41 @@ module "helm_addon" {
7373
name = "accountId"
7474
value = local.context.aws_caller_identity_account_id
7575
},
76+
{
77+
name = "enableTracing"
78+
value = var.enable_tracing
79+
},
80+
{
81+
name = "otlpHttpEndpoint"
82+
value = "0.0.0.0:4318"
83+
},
84+
{
85+
name = "otlpGrpcEndpoint"
86+
value = "0.0.0.0:4317"
87+
},
88+
{
89+
name = "enableCustomMetrics"
90+
value = var.enable_custom_metrics
91+
},
92+
{
93+
name = "customMetricsPorts"
94+
value = format(".*:(%s)$", join("|", var.custom_metrics_config.ports))
95+
},
96+
{
97+
name = "customMetricsDroppedSeriesPrefixes"
98+
value = format("(%s.*)$", join(".*|", var.custom_metrics_config.dropped_series_prefixes))
99+
}
76100
]
77101

78102
irsa_config = {
79103
create_kubernetes_namespace = true
80104
kubernetes_namespace = local.namespace
81105
create_kubernetes_service_account = true
82106
kubernetes_service_account = try(var.helm_config.service_account, local.name)
83-
irsa_iam_policies = ["arn:${data.aws_partition.current.partition}:iam::aws:policy/AmazonPrometheusRemoteWriteAccess"]
107+
irsa_iam_policies = [
108+
"arn:${data.aws_partition.current.partition}:iam::aws:policy/AmazonPrometheusRemoteWriteAccess",
109+
"arn:${data.aws_partition.current.partition}:iam::aws:policy/AWSXrayWriteOnlyAccess"
110+
]
84111
}
85112

86113
addon_context = local.context

modules/workloads/infra/otel-config/templates/opentelemetrycollector.yaml

Lines changed: 169 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,11 +3,42 @@ kind: OpenTelemetryCollector
33
metadata:
44
name: adot
55
spec:
6-
image: public.ecr.aws/aws-observability/aws-otel-collector:v0.21.0
6+
image: public.ecr.aws/aws-observability/aws-otel-collector:v0.22.1
7+
{{ if .Values.enableTracing }}
8+
mode: daemonset
9+
hostNetwork: true
10+
ports:
11+
- port: 4317
12+
name: "otlpgrpc"
13+
- port: 4318
14+
name: "oltphttp"
15+
{{ else }}
716
mode: deployment
17+
{{ end }}
818
serviceAccount: adot-collector-kubeprometheus
19+
env:
20+
- name: "K8S_NODE_NAME"
21+
valueFrom:
22+
fieldRef:
23+
fieldPath: "spec.nodeName"
24+
- name: "K8S_POD_NAME"
25+
valueFrom:
26+
fieldRef:
27+
fieldPath: "metadata.name"
28+
- name: "K8S_NAMESPACE"
29+
valueFrom:
30+
fieldRef:
31+
fieldPath: "metadata.namespace"
932
config: |
1033
receivers:
34+
{{ if .Values.enableTracing }}
35+
otlp:
36+
protocols:
37+
grpc:
38+
endpoint: {{ .Values.otlpGrpcEndpoint }}
39+
http:
40+
endpoint: {{ .Values.otlpHttpEndpoint }}
41+
{{ end }}
1142
prometheus:
1243
config:
1344
global:
@@ -37,6 +68,11 @@ spec:
3768
regex: (.+)
3869
target_label: __metrics_path__
3970
replacement: /api/v1/nodes/$${1}/proxy/metrics
71+
{{ if .Values.enableTracing }}
72+
- action: keep
73+
regex: $K8S_NODE_NAME
74+
source_labels: [__meta_kubernetes_node_name]
75+
{{ end }}
4076
- job_name: 'kubelet'
4177
scheme: https
4278
tls_config:
@@ -54,6 +90,11 @@ spec:
5490
regex: (.+)
5591
target_label: __metrics_path__
5692
replacement: /api/v1/nodes/$${1}/proxy/metrics/cadvisor
93+
{{ if .Values.enableTracing }}
94+
- action: keep
95+
regex: $K8S_NODE_NAME
96+
source_labels: [__meta_kubernetes_node_name]
97+
{{ end }}
5798
- job_name: serviceMonitor/default/kube-prometheus-stack-prometheus-node-exporter/0
5899
honor_timestamps: true
59100
scrape_interval: {{ .Values.globalScrapeInterval }}
@@ -148,6 +189,11 @@ spec:
148189
regex: "0"
149190
replacement: $$1
150191
action: keep
192+
{{ if .Values.enableTracing }}
193+
- action: keep
194+
regex: $K8S_NODE_NAME
195+
source_labels: [__meta_kubernetes_endpoint_node_name]
196+
{{ end }}
151197
kubernetes_sd_configs:
152198
- role: endpoints
153199
kubeconfig_file: ""
@@ -250,6 +296,11 @@ spec:
250296
regex: "0"
251297
replacement: $$1
252298
action: keep
299+
{{ if .Values.enableTracing }}
300+
- action: keep
301+
regex: $K8S_NODE_NAME
302+
source_labels: [__meta_kubernetes_endpoint_node_name]
303+
{{ end }}
253304
kubernetes_sd_configs:
254305
- role: endpoints
255306
kubeconfig_file: ""
@@ -351,6 +402,11 @@ spec:
351402
regex: "0"
352403
replacement: $$1
353404
action: keep
405+
{{ if .Values.enableTracing }}
406+
- action: keep
407+
regex: $K8S_NODE_NAME
408+
source_labels: [__meta_kubernetes_endpoint_node_name]
409+
{{ end }}
354410
kubernetes_sd_configs:
355411
- role: endpoints
356412
kubeconfig_file: ""
@@ -468,6 +524,11 @@ spec:
468524
regex: "0"
469525
replacement: $$1
470526
action: keep
527+
{{ if .Values.enableTracing }}
528+
- action: keep
529+
regex: $K8S_NODE_NAME
530+
source_labels: [__meta_kubernetes_endpoint_node_name]
531+
{{ end }}
471532
kubernetes_sd_configs:
472533
- role: endpoints
473534
kubeconfig_file: ""
@@ -585,6 +646,11 @@ spec:
585646
regex: "0"
586647
replacement: $$1
587648
action: keep
649+
{{ if .Values.enableTracing }}
650+
- action: keep
651+
regex: $K8S_NODE_NAME
652+
source_labels: [__meta_kubernetes_endpoint_node_name]
653+
{{ end }}
588654
kubernetes_sd_configs:
589655
- role: endpoints
590656
kubeconfig_file: ""
@@ -701,6 +767,11 @@ spec:
701767
regex: "0"
702768
replacement: $$1
703769
action: keep
770+
{{ if .Values.enableTracing }}
771+
- action: keep
772+
regex: $K8S_NODE_NAME
773+
source_labels: [__meta_kubernetes_endpoint_node_name]
774+
{{ end }}
704775
kubernetes_sd_configs:
705776
- role: endpoints
706777
kubeconfig_file: ""
@@ -805,6 +876,11 @@ spec:
805876
regex: "0"
806877
replacement: $$1
807878
action: keep
879+
{{ if .Values.enableTracing }}
880+
- action: keep
881+
regex: $K8S_NODE_NAME
882+
source_labels: [__meta_kubernetes_endpoint_node_name]
883+
{{ end }}
808884
kubernetes_sd_configs:
809885
- role: endpoints
810886
kubeconfig_file: ""
@@ -911,6 +987,11 @@ spec:
911987
regex: "0"
912988
replacement: $$1
913989
action: keep
990+
{{ if .Values.enableTracing }}
991+
- action: keep
992+
regex: $K8S_NODE_NAME
993+
source_labels: [__meta_kubernetes_endpoint_node_name]
994+
{{ end }}
914995
kubernetes_sd_configs:
915996
- role: endpoints
916997
kubeconfig_file: ""
@@ -1017,6 +1098,11 @@ spec:
10171098
regex: "0"
10181099
replacement: $$1
10191100
action: keep
1101+
{{ if .Values.enableTracing }}
1102+
- action: keep
1103+
regex: $K8S_NODE_NAME
1104+
source_labels: [__meta_kubernetes_endpoint_node_name]
1105+
{{ end }}
10201106
kubernetes_sd_configs:
10211107
- role: endpoints
10221108
kubeconfig_file: ""
@@ -1123,6 +1209,11 @@ spec:
11231209
regex: "0"
11241210
replacement: $$1
11251211
action: keep
1212+
{{ if .Values.enableTracing }}
1213+
- action: keep
1214+
regex: $K8S_NODE_NAME
1215+
source_labels: [__meta_kubernetes_endpoint_node_name]
1216+
{{ end }}
11261217
kubernetes_sd_configs:
11271218
- role: endpoints
11281219
kubeconfig_file: ""
@@ -1229,6 +1320,11 @@ spec:
12291320
regex: "0"
12301321
replacement: $$1
12311322
action: keep
1323+
{{ if .Values.enableTracing }}
1324+
- action: keep
1325+
regex: $K8S_NODE_NAME
1326+
source_labels: [__meta_kubernetes_endpoint_node_name]
1327+
{{ end }}
12321328
kubernetes_sd_configs:
12331329
- role: endpoints
12341330
kubeconfig_file: ""
@@ -1335,6 +1431,11 @@ spec:
13351431
regex: "0"
13361432
replacement: $$1
13371433
action: keep
1434+
{{ if .Values.enableTracing }}
1435+
- action: keep
1436+
regex: $K8S_NODE_NAME
1437+
source_labels: [__meta_kubernetes_endpoint_node_name]
1438+
{{ end }}
13381439
kubernetes_sd_configs:
13391440
- role: endpoints
13401441
kubeconfig_file: ""
@@ -1437,6 +1538,11 @@ spec:
14371538
regex: "0"
14381539
replacement: $$1
14391540
action: keep
1541+
{{ if .Values.enableTracing }}
1542+
- action: keep
1543+
regex: $K8S_NODE_NAME
1544+
source_labels: [__meta_kubernetes_endpoint_node_name]
1545+
{{ end }}
14401546
kubernetes_sd_configs:
14411547
- role: endpoints
14421548
kubeconfig_file: ""
@@ -1539,6 +1645,11 @@ spec:
15391645
regex: "0"
15401646
replacement: $$1
15411647
action: keep
1648+
{{ if .Values.enableTracing }}
1649+
- action: keep
1650+
regex: $K8S_NODE_NAME
1651+
source_labels: [__meta_kubernetes_endpoint_node_name]
1652+
{{ end }}
15421653
kubernetes_sd_configs:
15431654
- role: endpoints
15441655
kubeconfig_file: ""
@@ -1562,13 +1673,53 @@ spec:
15621673
- action: replace
15631674
source_labels: [__meta_kubernetes_endpoint_node_name]
15641675
target_label: nodename
1676+
{{ if .Values.enableTracing }}
1677+
- action: keep
1678+
regex: $K8S_NODE_NAME
1679+
source_labels: [__meta_kubernetes_endpoint_node_name]
1680+
{{ end }}
1681+
- job_name: "custom-metrics"
1682+
kubernetes_sd_configs:
1683+
- role: pod
1684+
relabel_configs:
1685+
- source_labels: [ __address__ ]
1686+
action: keep
1687+
regex: '{{ .Values.customMetricsPorts }}'
1688+
- action: replace
1689+
source_labels: [__meta_kubernetes_pod_node_name]
1690+
target_label: nodename
1691+
- action: replace
1692+
source_labels: [__meta_kubernetes_namespace]
1693+
target_label: namespace
1694+
- action: replace
1695+
source_labels: [__meta_kubernetes_pod_name]
1696+
target_label: pod_name
1697+
- action: replace
1698+
source_labels: [__meta_kubernetes_pod_container_name]
1699+
target_label: container_name
1700+
- action: replace
1701+
source_labels: [__meta_kubernetes_pod_controller_kind]
1702+
target_label: pod_controller_kind
1703+
{{ if .Values.enableTracing }}
1704+
- action: keep
1705+
regex: $K8S_NODE_NAME
1706+
source_labels: [__meta_kubernetes_pod_node_name]
1707+
{{ end }}
1708+
metric_relabel_configs:
1709+
- source_labels: [ __name__ ]
1710+
regex: '{{ .Values.customMetricsDroppedSeriesPrefixes }}'
1711+
action: drop
15651712
exporters:
1713+
{{ if .Values.enableTracing }}
1714+
awsxray:
1715+
region: {{ .Values.region }}
1716+
{{ end }}
15661717
prometheusremotewrite:
15671718
endpoint: {{ .Values.ampurl }}
15681719
auth:
15691720
authenticator: sigv4auth
15701721
logging:
1571-
loglevel: info
1722+
loglevel: warn
15721723
extensions:
15731724
sigv4auth:
15741725
region: {{ .Values.region }}
@@ -1578,9 +1729,25 @@ spec:
15781729
endpoint: :1888
15791730
zpages:
15801731
endpoint: :55679
1732+
processors:
1733+
batch/metrics:
1734+
timeout: 30s
1735+
send_batch_size: 500
1736+
{{ if .Values.enableTracing }}
1737+
batch/traces:
1738+
timeout: 10s
1739+
send_batch_size: 50
1740+
{{ end }}
15811741
service:
15821742
extensions: [pprof, zpages, health_check, sigv4auth]
15831743
pipelines:
15841744
metrics:
15851745
receivers: [prometheus]
1746+
processors: [batch/metrics]
15861747
exporters: [logging, prometheusremotewrite]
1748+
{{ if .Values.enableTracing }}
1749+
traces:
1750+
receivers: [otlp]
1751+
processors: [batch/traces]
1752+
exporters: [logging, awsxray]
1753+
{{ end }}

0 commit comments

Comments
 (0)