Skip to content

Commit 0392f5b

Browse files
elamaran11bonclay7
andauthored
Using GitOps with Grafana Operator to create Grafana Dashboard with Flux. (#164)
* Grafana With GitOps Feature * Grafana With GitOps Feature * Grafana With GitOps Feature * Fix setup logs retention policy (#169) * Fixing GitOps Repo * Commenting out the NodeExp Dash * Commenting out the NodeExp Dash * Adding all Grafana Dashboards * Adding all Grafana Dashboards * Fixing Grafana Operator Version and cleaning full boards * Fixing Grafana Operator Version and cleaning full boards * Fixing Grafana Operator Version and cleaning full boards * Fixing Grafana Operator Version and cleaning full boards * Fixing Grafana Operator Version and cleaning full boards and PR Issues * Fixing Grafana Operator Version and cleaning full boards and PR Issues * Fixing Grafana Operator Version and cleaning full boards and PR Issues * Fixing Grafana Operator Version and cleaning full boards and PR Issues * Fixing Grafana Operator Version and cleaning full boards and PR Issues * Fixing Grafana Operator Version and cleaning full boards and PR Issues --------- Co-authored-by: Rodrigue Koffi <[email protected]>
1 parent 0d1c2ef commit 0392f5b

File tree

14 files changed

+198
-12975
lines changed

14 files changed

+198
-12975
lines changed

examples/eks-multicluster/main.tf

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ module "eks_cluster_1_monitoring" {
3333
managed_prometheus_workspace_id = module.aws_observability_accelerator.managed_prometheus_workspace_id
3434
managed_prometheus_workspace_endpoint = module.aws_observability_accelerator.managed_prometheus_workspace_endpoint
3535
managed_prometheus_workspace_region = module.aws_observability_accelerator.managed_prometheus_workspace_region
36+
grafana_url = module.aws_observability_accelerator.managed_grafana_workspace_endpoint
3637

3738
java_config = {
3839
enable_alerting_rules = true
@@ -76,6 +77,7 @@ module "eks_cluster_2_monitoring" {
7677
managed_prometheus_workspace_id = module.aws_observability_accelerator.managed_prometheus_workspace_id
7778
managed_prometheus_workspace_endpoint = module.aws_observability_accelerator.managed_prometheus_workspace_endpoint
7879
managed_prometheus_workspace_region = module.aws_observability_accelerator.managed_prometheus_workspace_region
80+
grafana_url = module.aws_observability_accelerator.managed_grafana_workspace_endpoint
7981

8082
java_config = {
8183
enable_alerting_rules = false # addressed while setting up the eks_cluster_1 EKS cluster

examples/existing-cluster-java/main.tf

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,7 @@ module "eks_monitoring" {
7474
grafana_api_key = var.grafana_api_key
7575
target_secret_name = "grafana-admin-credentials"
7676
target_secret_namespace = "grafana-operator"
77+
grafana_url = module.aws_observability_accelerator.managed_grafana_workspace_endpoint
7778

7879
eks_cluster_id = var.eks_cluster_id
7980

examples/existing-cluster-nginx/main.tf

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,7 @@ module "eks_monitoring" {
7272
grafana_api_key = var.grafana_api_key
7373
target_secret_name = "grafana-admin-credentials"
7474
target_secret_namespace = "grafana-operator"
75+
grafana_url = module.aws_observability_accelerator.managed_grafana_workspace_endpoint
7576

7677
# control the publishing of dashboards by specifying the boolean value for the variable 'enable_dashboards', default is 'true'
7778
# the intention to publish is overruled depending upon whether grafana dashboard folder is created by the observability accelerator

examples/existing-cluster-with-base-and-infra/main.tf

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,7 @@ module "eks_monitoring" {
8282
grafana_api_key = var.grafana_api_key
8383
target_secret_name = "grafana-admin-credentials"
8484
target_secret_namespace = "grafana-operator"
85+
grafana_url = module.aws_observability_accelerator.managed_grafana_workspace_endpoint
8586

8687
# control the publishing of dashboards by specifying the boolean value for the variable 'enable_dashboards', default is 'true'
8788
# the intention to publish is overruled depending upon whether grafana dashboard folder is created by the observability accelerator

modules/eks-monitoring/README.md

Lines changed: 80 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -29,8 +29,8 @@ This module makes use of the open source [kube-prometheus-stack](https://github.
2929
| Name | Version |
3030
|------|---------|
3131
| <a name="provider_aws"></a> [aws](#provider\_aws) | >= 4.0.0 |
32-
| <a name="provider_grafana"></a> [grafana](#provider\_grafana) | >= 1.25.0 |
3332
| <a name="provider_helm"></a> [helm](#provider\_helm) | >= 2.4.1 |
33+
| <a name="provider_kubectl"></a> [kubectl](#provider\_kubectl) | >= 1.14 |
3434

3535
## Modules
3636

@@ -49,16 +49,12 @@ This module makes use of the open source [kube-prometheus-stack](https://github.
4949
|------|------|
5050
| [aws_prometheus_rule_group_namespace.alerting_rules](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/prometheus_rule_group_namespace) | resource |
5151
| [aws_prometheus_rule_group_namespace.recording_rules](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/prometheus_rule_group_namespace) | resource |
52-
| [grafana_dashboard.cluster](https://registry.terraform.io/providers/grafana/grafana/latest/docs/resources/dashboard) | resource |
53-
| [grafana_dashboard.kubelet](https://registry.terraform.io/providers/grafana/grafana/latest/docs/resources/dashboard) | resource |
54-
| [grafana_dashboard.nodeexp_nodes](https://registry.terraform.io/providers/grafana/grafana/latest/docs/resources/dashboard) | resource |
55-
| [grafana_dashboard.nodes](https://registry.terraform.io/providers/grafana/grafana/latest/docs/resources/dashboard) | resource |
56-
| [grafana_dashboard.nsworkload](https://registry.terraform.io/providers/grafana/grafana/latest/docs/resources/dashboard) | resource |
57-
| [grafana_dashboard.workloads](https://registry.terraform.io/providers/grafana/grafana/latest/docs/resources/dashboard) | resource |
5852
| [helm_release.fluxcd](https://registry.terraform.io/providers/hashicorp/helm/latest/docs/resources/release) | resource |
5953
| [helm_release.grafana_operator](https://registry.terraform.io/providers/hashicorp/helm/latest/docs/resources/release) | resource |
6054
| [helm_release.kube_state_metrics](https://registry.terraform.io/providers/hashicorp/helm/latest/docs/resources/release) | resource |
6155
| [helm_release.prometheus_node_exporter](https://registry.terraform.io/providers/hashicorp/helm/latest/docs/resources/release) | resource |
56+
| [kubectl_manifest.flux_gitrepository](https://registry.terraform.io/providers/gavinbunney/kubectl/latest/docs/resources/manifest) | resource |
57+
| [kubectl_manifest.flux_kustomization](https://registry.terraform.io/providers/gavinbunney/kubectl/latest/docs/resources/manifest) | resource |
6258
| [aws_caller_identity.current](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/caller_identity) | data source |
6359
| [aws_eks_cluster.eks_cluster](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/eks_cluster) | data source |
6460
| [aws_partition.current](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/partition) | data source |
@@ -87,8 +83,19 @@ This module makes use of the open source [kube-prometheus-stack](https://github.
8783
| <a name="input_enable_recording_rules"></a> [enable\_recording\_rules](#input\_enable\_recording\_rules) | Enables or disables Managed Prometheus recording rules | `bool` | `true` | no |
8884
| <a name="input_enable_tracing"></a> [enable\_tracing](#input\_enable\_tracing) | (Experimental) Enables tracing with AWS X-Ray. This changes the deploy mode of the collector to daemon set. Requirement: adot add-on <= 0.58-build.0 | `bool` | `false` | no |
8985
| <a name="input_flux_config"></a> [flux\_config](#input\_flux\_config) | FluxCD configuration | <pre>object({<br> create_namespace = bool<br> k8s_namespace = string<br> helm_chart_name = string<br> helm_chart_version = string<br> helm_release_name = string<br> helm_repo_url = string<br> helm_settings = map(string)<br> helm_values = map(any)<br> })</pre> | <pre>{<br> "create_namespace": true,<br> "helm_chart_name": "flux2",<br> "helm_chart_version": "2.7.0",<br> "helm_release_name": "observability-fluxcd-addon",<br> "helm_repo_url": "https://fluxcd-community.github.io/helm-charts",<br> "helm_settings": {},<br> "helm_values": {},<br> "k8s_namespace": "flux-system"<br>}</pre> | no |
86+
| <a name="input_flux_gitrepository_branch"></a> [flux\_gitrepository\_branch](#input\_flux\_gitrepository\_branch) | Flux GitRepository Branch | `string` | `"feature/allDashboards"` | no |
87+
| <a name="input_flux_gitrepository_url"></a> [flux\_gitrepository\_url](#input\_flux\_gitrepository\_url) | Flux GitRepository URL | `string` | `"https://github.com/aws-observability/aws-observability-accelerator"` | no |
88+
| <a name="input_flux_kustomization_path"></a> [flux\_kustomization\_path](#input\_flux\_kustomization\_path) | Flux Kustomization Path | `string` | `"./artifacts/grafana-operator-manifests"` | no |
89+
| <a name="input_flux_name"></a> [flux\_name](#input\_flux\_name) | Flux GitRepository and Kustomization Name | `string` | `"grafana-dashboards"` | no |
9090
| <a name="input_go_config"></a> [go\_config](#input\_go\_config) | Grafana Operator configuration | <pre>object({<br> create_namespace = bool<br> helm_chart = string<br> helm_name = string<br> k8s_namespace = string<br> helm_release_name = string<br> helm_chart_version = string<br> })</pre> | <pre>{<br> "create_namespace": true,<br> "helm_chart": "oci://ghcr.io/grafana-operator/helm-charts/grafana-operator",<br> "helm_chart_version": "v5.0.0-rc1",<br> "helm_name": "grafana-operator",<br> "helm_release_name": "grafana-operator",<br> "k8s_namespace": "grafana-operator"<br>}</pre> | no |
9191
| <a name="input_grafana_api_key"></a> [grafana\_api\_key](#input\_grafana\_api\_key) | Grafana API key for the Amazon Managed Grafana workspace | `string` | n/a | yes |
92+
| <a name="input_grafana_cluster_dashboard_url"></a> [grafana\_cluster\_dashboard\_url](#input\_grafana\_cluster\_dashboard\_url) | Dashboard URL for Cluster Grafana Dashboard JSON | `string` | `"https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/1d731aca31cdeb26e9fe9d017e609a5ba1621a30/artifacts/grafana-dashboards/cluster.json"` | no |
93+
| <a name="input_grafana_kubelet_dashboard_url"></a> [grafana\_kubelet\_dashboard\_url](#input\_grafana\_kubelet\_dashboard\_url) | Dashboard URL for Kubelet Grafana Dashboard JSON | `string` | `"https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/1d731aca31cdeb26e9fe9d017e609a5ba1621a30/artifacts/grafana-dashboards/kubelet.json"` | no |
94+
| <a name="input_grafana_namespace_workloads_dashboard_url"></a> [grafana\_namespace\_workloads\_dashboard\_url](#input\_grafana\_namespace\_workloads\_dashboard\_url) | Dashboard URL for Namespace Workloads Grafana Dashboard JSON | `string` | `"https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/1d731aca31cdeb26e9fe9d017e609a5ba1621a30/artifacts/grafana-dashboards/namespace-workloads.json"` | no |
95+
| <a name="input_grafana_node_exporter_dashboard_url"></a> [grafana\_node\_exporter\_dashboard\_url](#input\_grafana\_node\_exporter\_dashboard\_url) | Dashboard URL for Node Exporter Grafana Dashboard JSON | `string` | `"https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/nodeexporter-nodes.json"` | no |
96+
| <a name="input_grafana_nodes_dashboard_url"></a> [grafana\_nodes\_dashboard\_url](#input\_grafana\_nodes\_dashboard\_url) | Dashboard URL for Nodes Grafana Dashboard JSON | `string` | `"https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/1d731aca31cdeb26e9fe9d017e609a5ba1621a30/artifacts/grafana-dashboards/nodes.json"` | no |
97+
| <a name="input_grafana_url"></a> [grafana\_url](#input\_grafana\_url) | Endpoint URL of Amazon Managed Grafana workspace | `string` | n/a | yes |
98+
| <a name="input_grafana_workloads_dashboard_url"></a> [grafana\_workloads\_dashboard\_url](#input\_grafana\_workloads\_dashboard\_url) | Dashboard URL for Workloads Grafana Dashboard JSON | `string` | `"https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/1d731aca31cdeb26e9fe9d017e609a5ba1621a30/artifacts/grafana-dashboards/workloads.json"` | no |
9299
| <a name="input_helm_config"></a> [helm\_config](#input\_helm\_config) | Helm Config for Prometheus | `any` | `{}` | no |
93100
| <a name="input_irsa_iam_permissions_boundary"></a> [irsa\_iam\_permissions\_boundary](#input\_irsa\_iam\_permissions\_boundary) | IAM permissions boundary for IRSA roles | `string` | `null` | no |
94101
| <a name="input_irsa_iam_role_path"></a> [irsa\_iam\_role\_path](#input\_irsa\_iam\_role\_path) | IAM role path for IRSA roles | `string` | `"/"` | no |
@@ -117,7 +124,7 @@ This module makes use of the open source [kube-prometheus-stack](https://github.
117124

118125
## Troubleshooting
119126

120-
When you upgrade the eks-monitoring module from v2.1.0 or earlier, the following error may occur.
127+
1. When you upgrade the eks-monitoring module from v2.1.0 or earlier, the following error may occur.
121128

122129
```bash
123130
Error: cannot patch "prometheus-node-exporter" with kind DaemonSet: DaemonSet.apps "prometheus-node-exporter" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/instance":"prometheus-node-exporter", "app.kubernetes.io/name":"prometheus-node-exporter"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable
@@ -129,3 +136,68 @@ This is due to the upgrade of the node-exporter chart from v2 to v4. Manually de
129136
kubectl -n prometheus-node-exporter delete daemonset -l app=prometheus-node-exporter
130137
terraform apply
131138
```
139+
140+
2. In case you dont see the grafana dashboards in your Amazon Managed Grafana console, check on the logs on your grafana operator pod using the below command :
141+
142+
```bash
143+
kubectl get pods -n grafana-operator
144+
```
145+
146+
Output:
147+
148+
```
149+
NAME READY STATUS RESTARTS AGE
150+
grafana-operator-866d4446bb-nqq5c 1/1 Running 0 3h17m
151+
```
152+
153+
```bash
154+
kubectl logs grafana-operator-866d4446bb-nqq5c -n grafana-operator
155+
```
156+
157+
Output:
158+
159+
```
160+
1.6857285045556655e+09 ERROR error reconciling datasource {"controller": "grafanadatasource", "controllerGroup": "grafana.integreatly.org", "controllerKind": "GrafanaDatasource", "GrafanaDatasource": {"name":"grafanadatasource-sample-amp","namespace":"grafana-operator"}, "namespace": "grafana-operator", "name": "grafanadatasource-sample-amp", "reconcileID": "72cfd60c-a255-44a1-bfbd-88b0cbc4f90c", "datasource": "grafanadatasource-sample-amp", "grafana": "external-grafana", "error": "status: 401, body: {\"message\":\"Expired API key\"}\n"}
161+
github.com/grafana-operator/grafana-operator/controllers.(*GrafanaDatasourceReconciler).Reconcile
162+
```
163+
164+
If you observe, the the above `grafana-api-key error` in the logs, your grafana API key is expired. Please use the operational procedure to update your `grafana-api-key` :
165+
166+
- First, lets create a new Grafana API key.
167+
168+
```bash
169+
export GO_AMG_API_KEY=$(aws grafana create-workspace-api-key \
170+
--key-name "grafana-operator-key-new" \
171+
--key-role "ADMIN" \
172+
--seconds-to-live 432000 \
173+
--workspace-id <YOUR_WORKSPACE_ID> \
174+
--query key \
175+
--output text)
176+
```
177+
178+
- Next, lets grab the Grafana API key secret name from AWS Secrets Manager. The keyname should start with `terraform-..`
179+
180+
```bash
181+
aws secretsmanager list-secrets
182+
```
183+
184+
- Finally, update the Grafana API key secret in AWS Secrets Manager using the above new Grafana API key:
185+
186+
```bash
187+
aws secretsmanager update-secret \
188+
--secret-id <Your Secret Name> \
189+
--secret-string "${GO_AMG_API_KEY}" \
190+
--region <Your AWS Region>
191+
```
192+
193+
## Grafana datasources and dashboards via GitOps on Amazon Managed Grafana
194+
195+
We have upgraded out solution to use [grafana-operator](https://github.com/grafana-operator/grafana-operator#:~:text=The%20grafana%2Doperator%20is%20a,an%20easy%20and%20scalable%20way.) and [Flux](https://fluxcd.io/) to create Grafana datasources, folder and dashboards via GitOps on Amazon Managed Grafana.
196+
197+
The grafana-operator is a Kubernetes operator built to help you manage your Grafana instances inside and outside Kubernetes. Grafana Operator makes it possible for you to manage and create Grafana dashboards, datasources etc. declaratively between multiple instances in an easy and scalable way. Using grafana-operator it will be possible to add AWS data sources such as Amazon Managed Service for Prometheus, Amazon CloudWatch, AWS X-Ray to Amazon Managed Grafana and create Grafana dashboards on Amazon Managed Grafana from your Amazon EKS cluster. This enables us to use our Kubernetes cluster to create and manage the lifecycle of resources in Amazon Managed Grafana in a Kubernetes native way. This ultimately enables us to use GitOps mechanisms using CNCF projects such as Flux to create and manage the lifecycle of resources in Amazon Managed Grafana.
198+
199+
GitOps is a way of managing application and infrastructure deployment so that the whole system is described declaratively in a Git repository. It is an operational model that offers you the ability to manage the state of multiple Kubernetes clusters leveraging the best practices of version control, immutable artifacts, and automation. Flux is a declarative, GitOps-based continuous delivery tool that can be integrated into any CI/CD pipeline. It gives users the flexibility of choosing their Git provider (GitHub, GitLab, BitBucket). Now, with grafana-operator supporting the management of external Grafana instances such as Amazon Managed Grafana, operations personas can use GitOps mechanisms using CNCF projects such as Flux to create and manage the lifecycle of resources in Amazon Managed Grafana.
200+
201+
We have setup a [GitRepository](https://fluxcd.io/flux/components/source/gitrepositories/) and [Kustomization](https://fluxcd.io/flux/components/kustomize/kustomization/) using flux to sync our GitHub Repository to add Grafana Datasources, folder and Dashboards to Amazon Managed Grafana using Grafana Operator. GitRepository defines a Source to produce an Artifact for a Git repository revision. Kustomization defines a pipeline for fetching, decrypting, building, validating and applying Kustomize overlays or plain Kubernetes manifests. we are also using [Flux Post build variable substitution](https://fluxcd.io/flux/components/kustomize/kustomization/#post-build-variable-substitution) to dynamically render variables such as AMG_AWS_REGION, AMP_ENDPOINT_URL, AMG_ENDPOINT_URL,GRAFANA_NODEEXP_DASH_URL on the YAML manifests during deployment time to avoid hardcoding on the YAML manifests stored in Git repo.
202+
203+
We have placed our declarative code snippet to create an Amazon Managed Service For Promethes datasource and Grafana Dashboard in Amazon Managed Grafana in our [AWS Observabiity Accelerator GitHub Repository](https://github.com/aws-observability/aws-observability-accelerator/tree/main/artifacts/grafana-operator-manifests). We have setup a GitRepository to point to the AWS Observabiity Accelerator GitHub Repository and `Kustomization` for flux to sync Git Repository with artifacts in `./artifacts/grafana-operator-manifests` path in the AWS Observabiity Accelerator GitHub Repository. You can use this extension of our solution to point your own Kubernetes manifests to create Grafana Datasources and personified Grafana Dashboards of your choice using GitOps with Grafana Operator and Flux in Kubernetes native way with altering and redeploying this solution for changes to Grafana resources.

0 commit comments

Comments
 (0)