Skip to content

Commit 1f16bec

Browse files
authored
Update docs (#172)
1 parent 0392f5b commit 1f16bec

File tree

7 files changed

+92
-303
lines changed

7 files changed

+92
-303
lines changed

README.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ To explore the complete project documentation, please visit our [documentation s
2525

2626
## Getting started
2727

28-
To quickstart with a complete workflow and view Amazon EKS infrastructure dashboards,
28+
To quick start with a complete workflow and view Amazon EKS infrastructure dashboards,
2929
visit the [Amazon EKS cluster monitoring documentation](https://aws-observability.github.io/terraform-aws-observability-accelerator/eks/)
3030

3131
## How it works
@@ -39,8 +39,9 @@ v2+ releases introduces couple of breaking changes compared to previous versions
3939

4040
- `modules/workloads/infra` module moves to `modules/eks-monitoring`
4141
- All EKS configuration options moves from the base module to the `eks-monitoring` module
42-
- All EKS workload modules `modules/workloads/{java,nginx}` merge into `eks-monitoring` as configuration options (patterns), see [examples](./examples) to provide a more complete visiblity
42+
- All EKS workload modules `modules/workloads/{java,nginx}` merge into `eks-monitoring` as configuration options (patterns), see [examples](./examples) to provide a more complete visibility
4343
- All examples have been updated to reflect these changes
44+
- Introducing GitOps for Grafana contents (Dashboards, Folders and Data sources) with [Grafana Operator](https://github.com/grafana-operator/grafana-operator) and [Flux CD](https://fluxcd.io/)
4445

4546
### Base Module
4647

docs/concepts.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,19 @@ you need to track changes as part of a Git repository or CI/CD pipeline.
3131
!!! warning
3232
When using `tfvars` files, always be careful to not store and commit any secrets (keys, passwords, ...)
3333

34+
## Grafana contents via GitOps on Amazon Managed Grafana
35+
36+
We have upgraded our solution to use [grafana-operator](https://github.com/grafana-operator/grafana-operator#:~:text=The%20grafana%2Doperator%20is%20a,an%20easy%20and%20scalable%20way.) and [Flux](https://fluxcd.io/) to create Grafana data sources, folder and dashboards via GitOps on Amazon Managed Grafana.
37+
38+
The grafana-operator is a Kubernetes operator built to help you manage your Grafana instances inside and outside Kubernetes. Grafana Operator makes it possible for you to manage and create Grafana dashboards, datasources etc. declaratively between multiple instances in an easy and scalable way. Using grafana-operator it will be possible to add AWS data sources such as Amazon Managed Service for Prometheus, Amazon CloudWatch, AWS X-Ray to Amazon Managed Grafana and create Grafana dashboards on Amazon Managed Grafana from your Amazon EKS cluster. This enables us to use our Kubernetes cluster to create and manage the lifecycle of resources in Amazon Managed Grafana in a Kubernetes native way. This ultimately enables us to use GitOps mechanisms using CNCF projects such as Flux to create and manage the lifecycle of resources in Amazon Managed Grafana.
39+
40+
GitOps is a way of managing application and infrastructure deployment so that the whole system is described declaratively in a Git repository. It is an operational model that offers you the ability to manage the state of multiple Kubernetes clusters leveraging the best practices of version control, immutable artifacts, and automation. Flux is a declarative, GitOps-based continuous delivery tool that can be integrated into any CI/CD pipeline. It gives users the flexibility of choosing their Git provider (GitHub, GitLab, BitBucket). Now, with grafana-operator supporting the management of external Grafana instances such as Amazon Managed Grafana, operations personas can use GitOps mechanisms using CNCF projects such as Flux to create and manage the lifecycle of resources in Amazon Managed Grafana.
41+
42+
We have setup a [GitRepository](https://fluxcd.io/flux/components/source/gitrepositories/) and [Kustomization](https://fluxcd.io/flux/components/kustomize/kustomization/) using flux to sync our GitHub Repository to add Grafana Datasources, folder and Dashboards to Amazon Managed Grafana using Grafana Operator. GitRepository defines a Source to produce an Artifact for a Git repository revision. Kustomization defines a pipeline for fetching, decrypting, building, validating and applying Kustomize overlays or plain Kubernetes manifests. we are also using [Flux Post build variable substitution](https://fluxcd.io/flux/components/kustomize/kustomization/#post-build-variable-substitution) to dynamically render variables such as AMG_AWS_REGION, AMP_ENDPOINT_URL, AMG_ENDPOINT_URL,GRAFANA_NODEEXP_DASH_URL on the YAML manifests during deployment time to avoid hardcoding on the YAML manifests stored in Git repo.
43+
44+
We have placed our declarative code snippet to create an Amazon Managed Service For Promethes datasource and Grafana Dashboard in Amazon Managed Grafana in our [AWS Observabiity Accelerator GitHub Repository](https://github.com/aws-observability/aws-observability-accelerator/tree/main/artifacts/grafana-operator-manifests). We have setup a GitRepository to point to the AWS Observabiity Accelerator GitHub Repository and `Kustomization` for flux to sync Git Repository with artifacts in `./artifacts/grafana-operator-manifests` path in the AWS Observabiity Accelerator GitHub Repository. You can use this extension of our solution to point your own Kubernetes manifests to create Grafana Datasources and personified Grafana Dashboards of your choice using GitOps with Grafana Operator and Flux in Kubernetes native way with altering and redeploying this solution for changes to Grafana resources.
45+
46+
3447

3548
## v2.x changes
3649

docs/eks/index.md

Lines changed: 70 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ The Amazon EKS infrastructure Terraform modules focuses on metrics collection to
1414
Managed Service for Prometheus using the [AWS Distro for OpenTelemetry Operator](https://docs.aws.amazon.com/eks/latest/userguide/opentelemetry.html) for Amazon EKS. It deploys the [node exporter](https://github.com/prometheus/node_exporter) and [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics) in your cluster.
1515

1616
It provides default dashboards to get a comprehensible visibility on your nodes,
17-
namespaces, pods, and kubelet operations health. Finally, you get curated Prometheus recording rules
17+
namespaces, pods, and Kubelet operations health. Finally, you get curated Prometheus recording rules
1818
and alerts to operate your cluster.
1919

2020
Additionally, you can optionally collect custom Prometheus metrics from your applications running
@@ -72,9 +72,9 @@ aws amp create-workspace --alias observability-accelerator --query '.workspaceId
7272

7373
#### 5. Amazon Managed Grafana workspace
7474

75-
To run this example you need an Amazon Managed Grafana workspace. If you have
75+
To visualize metrics collected, you need an Amazon Managed Grafana workspace. If you have
7676
an existing workspace, create an environment variable as described below.
77-
To create a new workspace, visit our supporting example for Grafana.
77+
To create a new workspace, visit [our supporting example for Grafana](https://aws-observability.github.io/terraform-aws-observability-accelerator/helpers/managed-grafana/)
7878

7979
!!! note
8080
For the URL `https://g-xyz.grafana-workspace.eu-central-1.amazonaws.com`, the workspace ID would be `g-xyz`
@@ -91,8 +91,14 @@ run the `apply` or `destroy` command.
9191

9292
Ensure you have necessary IAM permissions (`CreateWorkspaceApiKey, DeleteWorkspaceApiKey`)
9393

94+
!!! note
95+
Starting version v2.5.x and above, we use Grafana Operator and External Secrets to
96+
manage Grafana contents. Your API Key will be stored securely on AWS Secrets Manager
97+
and the Grafana Operator will use it to sync dashboards, folders and data sources.
98+
Read more [here](https://aws-observability.github.io/terraform-aws-observability-accelerator/concepts/).
99+
94100
```bash
95-
export TF_VAR_grafana_api_key=`aws grafana create-workspace-api-key --key-name "observability-accelerator-$(date +%s)" --key-role ADMIN --seconds-to-live 1200 --workspace-id $TF_VAR_managed_grafana_workspace_id --query key --output text`
101+
export TF_VAR_grafana_api_key=`aws grafana create-workspace-api-key --key-name "observability-accelerator-$(date +%s)" --key-role ADMIN --seconds-to-live 7200 --workspace-id $TF_VAR_managed_grafana_workspace_id --query key --output text`
96102
```
97103

98104
## Deploy
@@ -105,10 +111,10 @@ terraform apply
105111

106112
## Visualization
107113

108-
#### 1. Prometheus datasource on Grafana
114+
#### 1. Prometheus data source on Grafana
109115

110116
Make sure to open the link in the output. After a successful deployment, this will open
111-
the Prometheus datasource configuration on Grafana.
117+
the Prometheus data source configuration on Grafana.
112118
Click `Save & test` and you should see a notification confirming that the Amazon Managed Service for Prometheus workspace is ready to be used on Grafana.
113119

114120
```bash
@@ -135,7 +141,7 @@ Open the Amazon Managed Service for Prometheus console and view the details of y
135141
To setup your alert receiver, with Amazon SNS, follow [this documentation](https://docs.aws.amazon.com/prometheus/latest/userguide/AMP-alertmanager-receiver.html)
136142

137143

138-
## Custom metrics collection
144+
## Custom Prometheus metrics collection
139145

140146
In addition to the cluster metrics, if you are interested in collecting Prometheus
141147
metrics from your pods, you can use setup `custom metrics collection`.
@@ -170,6 +176,63 @@ sum(up{job="custom-metrics"}) by (container_name, cluster, nodename)
170176

171177
## Troubleshooting
172178

179+
### 1. Grafana dashboards missing or Grafana API key expired
180+
181+
In case you don't see the grafana dashboards in your Amazon Managed Grafana console, check on the logs on your grafana operator pod using the below command :
182+
183+
```bash
184+
kubectl get pods -n grafana-operator
185+
```
186+
187+
Output:
188+
189+
```console
190+
NAME READY STATUS RESTARTS AGE
191+
grafana-operator-866d4446bb-nqq5c 1/1 Running 0 3h17m
192+
```
193+
194+
```bash
195+
kubectl logs grafana-operator-866d4446bb-nqq5c -n grafana-operator
196+
```
197+
198+
Output:
199+
200+
```console
201+
1.6857285045556655e+09 ERROR error reconciling datasource {"controller": "grafanadatasource", "controllerGroup": "grafana.integreatly.org", "controllerKind": "GrafanaDatasource", "GrafanaDatasource": {"name":"grafanadatasource-sample-amp","namespace":"grafana-operator"}, "namespace": "grafana-operator", "name": "grafanadatasource-sample-amp", "reconcileID": "72cfd60c-a255-44a1-bfbd-88b0cbc4f90c", "datasource": "grafanadatasource-sample-amp", "grafana": "external-grafana", "error": "status: 401, body: {\"message\":\"Expired API key\"}\n"}
202+
github.com/grafana-operator/grafana-operator/controllers.(*GrafanaDatasourceReconciler).Reconcile
203+
```
204+
205+
If you observe, the the above `grafana-api-key error` in the logs, your grafana API key is expired. Please use the operational procedure to update your `grafana-api-key` :
206+
207+
- First, lets create a new Grafana API key.
208+
209+
```bash
210+
export GO_AMG_API_KEY=$(aws grafana create-workspace-api-key \
211+
--key-name "grafana-operator-key-new" \
212+
--key-role "ADMIN" \
213+
--seconds-to-live 432000 \
214+
--workspace-id <YOUR_WORKSPACE_ID> \
215+
--query key \
216+
--output text)
217+
```
218+
219+
- Next, lets grab the Grafana API key secret name from AWS Secrets Manager. The keyname should start with `terraform-..`
220+
221+
```bash
222+
aws secretsmanager list-secrets
223+
```
224+
225+
- Finally, update the Grafana API key secret in AWS Secrets Manager using the above new Grafana API key:
226+
227+
```bash
228+
aws secretsmanager update-secret \
229+
--secret-id <Your Secret Name> \
230+
--secret-string "${GO_AMG_API_KEY}" \
231+
--region <Your AWS Region>
232+
```
233+
234+
### 2. Upgrade from 2.1.0 or earlier
235+
173236
When you upgrade the eks-monitoring module from v2.1.0 or earlier, the following error may occur.
174237

175238
```bash

docs/index.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,9 +25,8 @@ traces collection, dashboards and alerts for monitoring:
2525
- NGINX workloads (running on Amazon EKS)
2626
- Java/JMX workloads (running on Amazon EKS)
2727
- Amazon Managed Service for Prometheus workspaces with Amazon CloudWatch
28-
- Installs Grafana Operator to add AWS data sources and create Grafana Dashboards to Amazon Managed Grafana.
29-
- Installs FluxCD to perform GitOps sync of a Git Repo to EKS Cluster. We will use this later for creating Grafana Dashboards and AWS datasources to Amazon Managed Grafana.
30-
- Installs External Secrets Operator to retrieve and Sync the Grafana API keys.
28+
- [Grafana Operator](https://github.com/grafana-operator/grafana-operator) and [Flux CD](https://fluxcd.io/) to manage Grafana contents (AWS data sources, Grafana Dashboards) with GitOps
29+
- External Secrets Operator to retrieve and Sync the Grafana API keys
3130

3231
These modules can be directly configured in your existing Terraform
3332
configurations or ready to be deployed in our packaged

examples/eks-cluster-with-vpc/README.md

Lines changed: 1 addition & 90 deletions
Original file line numberDiff line numberDiff line change
@@ -8,93 +8,4 @@ This example deploys the following Basic EKS Cluster with VPC
88
- Creates Internet gateway for Public Subnets and NAT Gateway for Private Subnets
99
- Creates EKS Cluster Control plane with one managed node group
1010

11-
## How to Deploy
12-
13-
### Prerequisites
14-
15-
Ensure that you have installed the following tools in your Mac or Windows Laptop before start working with this module and run Terraform Plan and Apply
16-
17-
1. [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html)
18-
2. [Kubectl](https://Kubernetes.io/docs/tasks/tools/)
19-
3. [Terraform](https://learn.hashicorp.com/tutorials/terraform/install-cli)
20-
21-
### Minimum IAM Policy
22-
23-
> **Note**: The policy resource is set as `*` to allow all resources, this is not a recommended practice.
24-
25-
You can find the policy [here](min-iam-policy.json)
26-
27-
28-
### Deployment Steps
29-
30-
#### Step 1: Clone the repo using the command below
31-
32-
```sh
33-
git clone https://github.com/aws-observability/terraform-aws-observability-accelerator.git
34-
```
35-
36-
#### Step 2: Run Terraform INIT
37-
38-
Initialize a working directory with configuration files
39-
40-
```sh
41-
cd examples/eks-cluster-with-vpc/
42-
terraform init
43-
```
44-
45-
#### Step 3: Run Terraform PLAN
46-
47-
Verify the resources created by this execution
48-
49-
```sh
50-
export TF_VAR_aws_region=<ENTER YOUR REGION> # Select your own region
51-
export TF_VAR_cluster_name=<ENTER YOUR CLUSTER NAME> # Enter your cluster name
52-
terraform plan
53-
```
54-
55-
#### Step 4: Finally, Terraform APPLY
56-
57-
**Deploy the pattern**
58-
59-
```sh
60-
terraform apply
61-
```
62-
63-
Enter `yes` to apply.
64-
65-
### Configure `kubectl` and test cluster
66-
67-
EKS Cluster details can be extracted from terraform output or from AWS Console to get the name of cluster.
68-
This following command used to update the `kubeconfig` in your local machine where you run kubectl commands to interact with your EKS Cluster.
69-
70-
#### Step 5: Run `update-kubeconfig` command
71-
72-
`~/.kube/config` file gets updated with cluster details and certificate from the below command
73-
74-
aws eks --region <enter-your-region> update-kubeconfig --name <cluster-name>
75-
76-
#### Step 6: List all the worker nodes by running the command below
77-
78-
kubectl get nodes
79-
80-
#### Step 7: List all the pods running in `kube-system` namespace
81-
82-
kubectl get pods -n kube-system
83-
84-
## Cleanup
85-
86-
To clean up your environment, destroy the Terraform modules in reverse order.
87-
88-
Destroy the Kubernetes Add-ons, EKS cluster with Node groups and VPC
89-
90-
```sh
91-
terraform destroy -target="module.eks_blueprints_kubernetes_addons" -auto-approve
92-
terraform destroy -target="module.eks_blueprints" -auto-approve
93-
terraform destroy -target="module.vpc" -auto-approve
94-
```
95-
96-
Finally, destroy any additional resources that are not in the above modules
97-
98-
```sh
99-
terraform destroy -auto-approve
100-
```
11+
You can view the full documentation for this example [here](https://aws-observability.github.io/terraform-aws-observability-accelerator/helpers/new-eks-cluster/)

0 commit comments

Comments
 (0)