Skip to content

Commit 18e5589

Browse files
committed
Initial commit
1 parent cb9ba19 commit 18e5589

File tree

14 files changed

+987
-58
lines changed

14 files changed

+987
-58
lines changed

.github/settings.yml

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,7 @@
11
# Upstream changes from _extends are only recognized when modifications are made to this file in the default branch.
22
_extends: .github
33
repository:
4-
name: template
5-
description: Template for Terraform Components
4+
name: aws-eks-datadog-agent
5+
description: This component installs the `datadog-agent` for EKS clusters
66
homepage: https://cloudposse.com/accelerate
77
topics: terraform, terraform-component
8-
9-
10-
11-

CHANGELOG.md

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
## PR [#814](https://github.com/cloudposse/terraform-aws-components/pull/814)
2+
3+
### Possible Breaking Change
4+
5+
Removed inputs `iam_role_enabled` and `iam_policy_statements` because the Datadog agent does not need an IAM (IRSA) role
6+
or any special AWS permissions because it works solely within the Kubernetes environment. (Datadog has AWS integrations
7+
to handle monitoring that requires AWS permissions.)
8+
9+
This only a breaking change if you were setting these inputs. If you were, simply remove them from your configuration.
10+
11+
### Possible Breaking Change
12+
13+
Previously this component directly created the Kubernetes namespace for the agent when `create_namespace` was set to
14+
`true`. Now this component delegates that responsibility to the `helm-release` module, which better coordinates the
15+
destruction of resources at destruction time (for example, ensuring that the Helm release is completely destroyed and
16+
finalizers run before deleting the namespace).
17+
18+
Generally the simplest upgrade path is to destroy the Helm release, then destroy the namespace, then apply the new
19+
configuration. Alternatively, you can use `terraform state mv` to move the existing namespace to the new Terraform
20+
"address", which will preserve the existing deployment and reduce the possibility of the destroy failing and leaving the
21+
Kubernetes cluster in a bad state.
22+
23+
### Cluster Agent Redundancy
24+
25+
In this PR we have defaulted the number of Cluster Agents to 2. This is because when there are no Cluster Agents, all
26+
cluster metrics are lost. Having 2 agents makes it possible to keep 1 agent running at all times, even when the other is
27+
on a node being drained.
28+
29+
### DNS Resolution Enhancement
30+
31+
If Datadog processes are looking for where to send data and are configured to look up
32+
`datadog.monitoring.svc.cluster.local`, by default the cluster will make a DNS query for each of the following:
33+
34+
1. `datadog.monitoring.svc.cluster.local.monitoring.svc.cluster.local`
35+
2. `datadog.monitoring.svc.cluster.local.svc.cluster.local`
36+
3. `datadog.monitoring.svc.cluster.local.cluster.local`
37+
4. `datadog.monitoring.svc.cluster.local.ec2.internal`
38+
5. `datadog.monitoring.svc.cluster.local`
39+
40+
due to the DNS resolver's
41+
[search path](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#namespaces-of-services). Because
42+
this lookup happens so frequently (several times a second in a production environment), it can cause a lot of
43+
unnecessary work, even if the DNS query is cached.
44+
45+
In this PR we have set `ndots: 2` in the agent and cluster agent configuration so that only the 5th query is made. (In
46+
Kubernetes, the default value for `ndots` is 5. DNS queries having fewer than `ndots` dots in them will be attempted
47+
using each component of the search path in turn until a match is found, while those with more dots, or with a final dot,
48+
are looked up as is.)
49+
50+
Alternately, where you are setting the host name to be resolved, you can add a final dot at the end so that the search
51+
path is not used, e.g. `datadog.monitoring.svc.cluster.local.`
52+
53+
### Note for Bottlerocket users
54+
55+
If you are using Bottlerocket, you will want to uncomment the following from `values.yaml` or add it to your `values`
56+
input:
57+
58+
```yaml
59+
criSocketPath: /run/dockershim.sock # Bottlerocket Only
60+
env: # Bottlerocket Only
61+
- name: DD_AUTOCONFIG_INCLUDE_FEATURES # Bottlerocket Only
62+
value: "containerd" # Bottlerocket Only
63+
```
64+
65+
See the [Datadog documentation](https://docs.datadoghq.com/containers/kubernetes/distributions/?tab=helm#EKS) for
66+
details.

README.yaml

Lines changed: 290 additions & 48 deletions
Large diffs are not rendered by default.
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
http_check.yaml:
2+
cluster_check: true
3+
init_config:
4+
instances:
5+
- name: "[${stage}] Echo Server"
6+
url: "https://echo.${stage}.acme.com"
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
http_check.yaml:
2+
instances:
3+
- name: "[${stage}] Custom Dev App"
4+
url: "https://my-custom-dev-app.acme.com"

src/helm-variables.tf

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
variable "description" {
2+
type = string
3+
description = "Release description attribute (visible in the history)"
4+
default = null
5+
}
6+
7+
variable "chart" {
8+
type = string
9+
description = "Chart name to be installed. The chart name can be local path, a URL to a chart, or the name of the chart if `repository` is specified. It is also possible to use the `<repository>/<chart>` format here if you are running Terraform on a system that the repository has been added to with `helm repo add` but this is not recommended"
10+
}
11+
12+
variable "repository" {
13+
type = string
14+
description = "Repository URL where to locate the requested chart"
15+
default = null
16+
}
17+
18+
variable "chart_version" {
19+
type = string
20+
description = "Specify the exact chart version to install. If this is not specified, the latest version is installed"
21+
default = null
22+
}
23+
24+
variable "kubernetes_namespace" {
25+
type = string
26+
description = "Kubernetes namespace to install the release into"
27+
}
28+
29+
variable "timeout" {
30+
type = number
31+
description = "Time in seconds to wait for any individual kubernetes operation (like Jobs for hooks). Defaults to `300` seconds"
32+
default = null
33+
}
34+
35+
variable "cleanup_on_fail" {
36+
type = bool
37+
description = "Allow deletion of new resources created in this upgrade when upgrade fails"
38+
default = true
39+
}
40+
41+
variable "atomic" {
42+
type = bool
43+
description = "If set, installation process purges chart on fail. The wait flag will be set automatically if atomic is used"
44+
default = true
45+
}
46+
47+
variable "wait" {
48+
type = bool
49+
description = "Will wait until all resources are in a ready state before marking the release as successful. It will wait for as long as `timeout`. Defaults to `true`"
50+
default = null
51+
}
52+
53+
variable "create_namespace" {
54+
type = bool
55+
description = "Create the Kubernetes namespace if it does not yet exist"
56+
default = true
57+
}
58+
59+
variable "verify" {
60+
type = bool
61+
description = "Verify the package before installing it. Helm uses a provenance file to verify the integrity of the chart; this must be hosted alongside the chart"
62+
default = false
63+
}

src/main.tf

Lines changed: 139 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,147 @@
11
locals {
22
enabled = module.this.enabled
3+
4+
tags = module.this.tags
5+
6+
datadog_api_key = module.datadog_configuration.datadog_api_key
7+
datadog_app_key = module.datadog_configuration.datadog_app_key
8+
datadog_site = module.datadog_configuration.datadog_site
9+
10+
# combine context tags with passed in datadog_tags
11+
# skip name since that won't be relevant for each metric
12+
datadog_tags = toset(distinct(concat([
13+
for k, v in module.this.tags : "${lower(k)}:${v}" if lower(k) != "name"
14+
], tolist(var.datadog_tags))))
15+
16+
cluster_checks_enabled = local.enabled && var.cluster_checks_enabled
17+
18+
context_tags = {
19+
for k, v in module.this.tags :
20+
lower(k) => v
21+
}
22+
23+
deep_map_merge = local.cluster_checks_enabled ? module.datadog_cluster_check_yaml_config[0].map_configs : {}
24+
datadog_cluster_checks = {
25+
for k, v in local.deep_map_merge :
26+
k => merge(v, {
27+
instances = [
28+
for key, val in v.instances :
29+
merge(val, {
30+
tags = [
31+
for tag, tag_value in local.context_tags :
32+
format("%s:%s", tag, tag_value)
33+
if contains(var.datadog_cluster_check_auto_added_tags, tag)
34+
]
35+
})
36+
]
37+
})
38+
}
39+
set_datadog_cluster_checks = [
40+
for cluster_check_key, cluster_check_value in local.datadog_cluster_checks : {
41+
# Since we are using json pathing to set deep yaml values, and the key we want to set is `something.yaml`
42+
# we need to escape the key of the cluster check.
43+
name = format("clusterAgent.confd.%s", replace(cluster_check_key, ".", "\\."))
44+
type = "auto"
45+
value = yamlencode(cluster_check_value)
46+
}
47+
]
348
}
449

50+
module "datadog_configuration" {
51+
source = "../../datadog-configuration/modules/datadog_keys"
52+
context = module.this.context
53+
}
554

55+
module "datadog_cluster_check_yaml_config" {
56+
count = local.cluster_checks_enabled ? 1 : 0
657

58+
source = "cloudposse/config/yaml"
59+
version = "1.0.2"
760

61+
map_config_local_base_path = path.module
62+
map_config_paths = var.datadog_cluster_check_config_paths
863

64+
append_list_enabled = true
65+
66+
parameters = merge(
67+
var.datadog_cluster_check_config_parameters,
68+
local.context_tags
69+
)
70+
71+
context = module.this.context
72+
}
73+
74+
module "values_merge" {
75+
source = "cloudposse/config/yaml//modules/deepmerge"
76+
version = "1.0.2"
77+
78+
# Merge in order: datadog values, var.values
79+
maps = [
80+
yamldecode(
81+
file("${path.module}/values.yaml")
82+
),
83+
var.values,
84+
]
85+
}
86+
87+
88+
module "datadog_agent" {
89+
source = "cloudposse/helm-release/aws"
90+
version = "0.10.0"
91+
92+
name = module.this.name
93+
chart = var.chart
94+
description = var.description
95+
repository = var.repository
96+
chart_version = var.chart_version
97+
98+
kubernetes_namespace = var.kubernetes_namespace
99+
create_namespace_with_kubernetes = var.create_namespace
100+
101+
verify = var.verify
102+
wait = var.wait
103+
atomic = var.atomic
104+
cleanup_on_fail = var.cleanup_on_fail
105+
timeout = var.timeout
106+
107+
eks_cluster_oidc_issuer_url = module.eks.outputs.eks_cluster_identity_oidc_issuer
108+
109+
values = [
110+
yamlencode(module.values_merge.merged)
111+
]
112+
113+
set_sensitive = [
114+
{
115+
name = "datadog.apiKey"
116+
type = "string"
117+
value = local.datadog_api_key
118+
},
119+
{
120+
name = "datadog.appKey"
121+
type = "string"
122+
value = local.datadog_app_key
123+
},
124+
{
125+
name = "datadog.site"
126+
type = "string"
127+
value = local.datadog_site
128+
}
129+
]
130+
131+
set = concat([
132+
{
133+
name = "datadog.tags"
134+
type = "auto"
135+
value = yamlencode(local.datadog_tags)
136+
},
137+
{
138+
name = "datadog.clusterName"
139+
type = "string"
140+
value = module.eks.outputs.eks_cluster_id
141+
},
142+
], local.set_datadog_cluster_checks)
143+
144+
iam_role_enabled = false
145+
146+
context = module.this.context
147+
}

src/outputs.tf

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,9 @@
1-
output "mock" {
2-
description = "Mock output example for the Cloud Posse Terraform component template"
3-
value = local.enabled ? "hello ${basename(abspath(path.module))}" : ""
1+
output "metadata" {
2+
value = local.enabled ? module.datadog_agent.metadata : null
3+
description = "Block status of the deployed release"
4+
}
5+
6+
output "cluster_checks" {
7+
value = local.datadog_cluster_checks
8+
description = "Cluster Checks for the cluster"
49
}

0 commit comments

Comments
 (0)