Update docs (#172)

bonclay7 · web-flow · commit 1f16bec27f20 · 2023-06-06T08:19:22.000+02:00
diff --git a/README.md b/README.md
@@ -25,7 +25,7 @@ To explore the complete project documentation, please visit our [documentation s
 
 ## Getting started
 
-To quickstart with a complete workflow and view Amazon EKS infrastructure dashboards,
+To quick start with a complete workflow and view Amazon EKS infrastructure dashboards,
 visit the [Amazon EKS cluster monitoring documentation](https://aws-observability.github.io/terraform-aws-observability-accelerator/eks/)
 
 ## How it works
@@ -39,8 +39,9 @@ v2+ releases introduces couple of breaking changes compared to previous versions
 
 - `modules/workloads/infra` module moves to `modules/eks-monitoring`
 - All EKS configuration options moves from the base  module to the `eks-monitoring` module
-- All EKS workload modules `modules/workloads/{java,nginx}` merge into `eks-monitoring` as configuration options (patterns), see [examples](./examples) to provide a more complete visiblity
+- All EKS workload modules `modules/workloads/{java,nginx}` merge into `eks-monitoring` as configuration options (patterns), see [examples](./examples) to provide a more complete visibility
 - All examples have been updated to reflect these changes
+- Introducing GitOps for Grafana contents (Dashboards, Folders and Data sources) with [Grafana Operator](https://github.com/grafana-operator/grafana-operator) and [Flux CD](https://fluxcd.io/)
 
 ### Base Module
 
diff --git a/docs/concepts.md b/docs/concepts.md
@@ -31,6 +31,19 @@ you need to track changes as part of a Git repository or CI/CD pipeline.
 !!! warning
     When using `tfvars` files, always be careful to not store and commit any secrets (keys,     passwords, ...)
 
+## Grafana contents via GitOps on Amazon Managed Grafana
+
+We have upgraded our solution to use [grafana-operator](https://github.com/grafana-operator/grafana-operator#:~:text=The%20grafana%2Doperator%20is%20a,an%20easy%20and%20scalable%20way.) and [Flux](https://fluxcd.io/) to create Grafana data sources, folder and dashboards via GitOps on Amazon Managed Grafana.
+
+The grafana-operator is a Kubernetes operator built to help you manage your Grafana instances inside and outside Kubernetes. Grafana Operator makes it possible for you to manage and create Grafana dashboards, datasources etc. declaratively between multiple instances in an easy and scalable way. Using grafana-operator it will be possible to add AWS data sources such as Amazon Managed Service for Prometheus, Amazon CloudWatch, AWS X-Ray to Amazon Managed Grafana and create Grafana dashboards on Amazon Managed Grafana from your Amazon EKS cluster. This enables us to use our Kubernetes cluster to create and manage the lifecycle of resources in Amazon Managed Grafana in a Kubernetes native way. This ultimately enables us to use GitOps mechanisms using CNCF projects such as Flux  to create and manage the lifecycle of resources in Amazon Managed Grafana.
+
+GitOps is a way of managing application and infrastructure deployment so that the whole system is described declaratively in a Git repository. It is an operational model that offers you the ability to manage the state of multiple Kubernetes clusters leveraging the best practices of version control, immutable artifacts, and automation. Flux  is a declarative, GitOps-based continuous delivery tool that can be integrated into any CI/CD pipeline. It gives users the flexibility of choosing their Git provider (GitHub, GitLab, BitBucket). Now, with grafana-operator supporting the management of external Grafana instances such as Amazon Managed Grafana, operations personas can use GitOps mechanisms using CNCF projects such as Flux to create and manage the lifecycle of resources in Amazon Managed Grafana.
+
+We have setup a [GitRepository](https://fluxcd.io/flux/components/source/gitrepositories/) and [Kustomization](https://fluxcd.io/flux/components/kustomize/kustomization/) using flux to sync our GitHub Repository to add Grafana Datasources, folder and Dashboards to Amazon Managed Grafana using Grafana Operator. GitRepository defines a Source to produce an Artifact for a Git repository revision. Kustomization defines a pipeline for fetching, decrypting, building, validating and applying Kustomize overlays or plain Kubernetes manifests. we are also using [Flux Post build variable substitution](https://fluxcd.io/flux/components/kustomize/kustomization/#post-build-variable-substitution) to dynamically render variables such as AMG_AWS_REGION, AMP_ENDPOINT_URL, AMG_ENDPOINT_URL,GRAFANA_NODEEXP_DASH_URL on the YAML manifests during deployment time to avoid hardcoding on the YAML manifests stored in Git repo.
+
+We have placed our declarative code snippet to create an Amazon Managed Service For Promethes datasource and Grafana Dashboard in Amazon Managed Grafana in our [AWS Observabiity Accelerator GitHub Repository](https://github.com/aws-observability/aws-observability-accelerator/tree/main/artifacts/grafana-operator-manifests). We have setup a GitRepository to point to the AWS Observabiity Accelerator GitHub Repository and `Kustomization` for flux to sync Git Repository with artifacts in `./artifacts/grafana-operator-manifests` path in the AWS Observabiity Accelerator GitHub Repository. You can use this extension of our solution to point your own Kubernetes manifests to create Grafana Datasources and personified Grafana Dashboards of your choice using GitOps with Grafana Operator and Flux in Kubernetes native way with altering and redeploying this solution for changes to Grafana resources.
+
+
 
 ## v2.x changes
 
diff --git a/docs/eks/index.md b/docs/eks/index.md
@@ -14,7 +14,7 @@ The Amazon EKS infrastructure Terraform modules focuses on metrics collection to
 Managed Service for Prometheus using the [AWS Distro for OpenTelemetry Operator](https://docs.aws.amazon.com/eks/latest/userguide/opentelemetry.html) for Amazon EKS. It deploys the [node exporter](https://github.com/prometheus/node_exporter) and [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics) in your cluster.
 
 It provides default dashboards to get a comprehensible visibility on your nodes,
-namespaces, pods, and kubelet operations health. Finally, you get curated Prometheus recording rules
+namespaces, pods, and Kubelet operations health. Finally, you get curated Prometheus recording rules
 and alerts to operate your cluster.
 
 Additionally, you can optionally collect custom Prometheus metrics from your applications running
@@ -72,9 +72,9 @@ aws amp create-workspace --alias observability-accelerator --query '.workspaceId
 
 #### 5. Amazon Managed Grafana workspace
 
-To run this example you need an Amazon Managed Grafana workspace. If you have
+To visualize metrics collected, you need an Amazon Managed Grafana workspace. If you have
 an existing workspace, create an environment variable as described below.
-To create a new workspace, visit our supporting example for Grafana.
+To create a new workspace, visit [our supporting example for Grafana](https://aws-observability.github.io/terraform-aws-observability-accelerator/helpers/managed-grafana/)
 
 !!! note
     For the URL `https://g-xyz.grafana-workspace.eu-central-1.amazonaws.com`, the workspace ID would be `g-xyz`
@@ -91,8 +91,14 @@ run the `apply` or `destroy` command.
 
 Ensure you have necessary IAM permissions (`CreateWorkspaceApiKey, DeleteWorkspaceApiKey`)
 
+!!! note
+    Starting version v2.5.x and above, we use Grafana Operator and External Secrets to
+    manage Grafana contents. Your API Key will be stored securely on AWS Secrets Manager
+    and the Grafana Operator will use it to sync dashboards, folders and data sources.
+    Read more [here](https://aws-observability.github.io/terraform-aws-observability-accelerator/concepts/).
+
 ```bash
-export TF_VAR_grafana_api_key=`aws grafana create-workspace-api-key --key-name "observability-accelerator-$(date +%s)" --key-role ADMIN --seconds-to-live 1200 --workspace-id $TF_VAR_managed_grafana_workspace_id --query key --output text`
+export TF_VAR_grafana_api_key=`aws grafana create-workspace-api-key --key-name "observability-accelerator-$(date +%s)" --key-role ADMIN --seconds-to-live 7200 --workspace-id $TF_VAR_managed_grafana_workspace_id --query key --output text`
 ```
 
 ## Deploy
@@ -105,10 +111,10 @@ terraform apply
 
 ## Visualization
 
-#### 1. Prometheus datasource on Grafana
+#### 1. Prometheus data source on Grafana
 
 Make sure to open the link in the output. After a successful deployment, this will open
-the Prometheus datasource configuration on Grafana.
+the Prometheus data source configuration on Grafana.
 Click `Save & test` and you should see a notification confirming that the Amazon Managed Service for Prometheus workspace is ready to be used on Grafana.
 
 ```bash
@@ -135,7 +141,7 @@ Open the Amazon Managed Service for Prometheus console and view the details of y
     To setup your alert receiver, with Amazon SNS, follow [this documentation](https://docs.aws.amazon.com/prometheus/latest/userguide/AMP-alertmanager-receiver.html)
 
 
-## Custom metrics collection
+## Custom Prometheus metrics collection
 
 In addition to the cluster metrics, if you are interested in collecting Prometheus
 metrics from your pods, you can use setup `custom metrics collection`.
@@ -170,6 +176,63 @@ sum(up{job="custom-metrics"}) by (container_name, cluster, nodename)
 
 ## Troubleshooting
 
+### 1. Grafana dashboards missing or Grafana API key expired
+
+In case you don't see the grafana dashboards in your Amazon Managed Grafana console, check on the logs on your grafana operator pod using the below command :
+
+```bash
+kubectl get pods -n grafana-operator
+```
+
+Output:
+
+```console
+NAME                                READY   STATUS    RESTARTS   AGE
+grafana-operator-866d4446bb-nqq5c   1/1     Running   0          3h17m
+```
+
+```bash
+kubectl logs grafana-operator-866d4446bb-nqq5c -n grafana-operator
+```
+
+Output:
+
+```console
+1.6857285045556655e+09	ERROR	error reconciling datasource	{"controller": "grafanadatasource", "controllerGroup": "grafana.integreatly.org", "controllerKind": "GrafanaDatasource", "GrafanaDatasource": {"name":"grafanadatasource-sample-amp","namespace":"grafana-operator"}, "namespace": "grafana-operator", "name": "grafanadatasource-sample-amp", "reconcileID": "72cfd60c-a255-44a1-bfbd-88b0cbc4f90c", "datasource": "grafanadatasource-sample-amp", "grafana": "external-grafana", "error": "status: 401, body: {\"message\":\"Expired API key\"}\n"}
+github.com/grafana-operator/grafana-operator/controllers.(*GrafanaDatasourceReconciler).Reconcile
+```
+
+If you observe, the the above `grafana-api-key error` in the logs, your grafana API key is expired. Please use the operational procedure to update your `grafana-api-key` :
+
+- First, lets create a new Grafana API key.
+
+```bash
+export GO_AMG_API_KEY=$(aws grafana create-workspace-api-key \
+  --key-name "grafana-operator-key-new" \
+  --key-role "ADMIN" \
+  --seconds-to-live 432000 \
+  --workspace-id <YOUR_WORKSPACE_ID> \
+  --query key \
+  --output text)
+```
+
+- Next, lets grab the Grafana API key secret name from AWS Secrets Manager. The keyname should start with `terraform-..`
+
+```bash
+aws secretsmanager list-secrets
+```
+
+- Finally, update the Grafana API key secret in AWS Secrets Manager using the above new Grafana API key:
+
+```bash
+aws secretsmanager update-secret \
+    --secret-id  <Your Secret Name> \
+    --secret-string "${GO_AMG_API_KEY}" \
+    --region <Your AWS Region>
+```
+
+### 2. Upgrade from 2.1.0 or earlier
+
 When you upgrade the eks-monitoring module from v2.1.0 or earlier, the following error may occur.
 
 ```bash
diff --git a/docs/index.md b/docs/index.md
@@ -25,9 +25,8 @@ traces collection, dashboards and alerts for monitoring:
 - NGINX workloads (running on Amazon EKS)
 - Java/JMX workloads (running on Amazon EKS)
 - Amazon Managed Service for Prometheus workspaces with Amazon CloudWatch
-- Installs Grafana Operator to add AWS data sources and create Grafana Dashboards to Amazon Managed Grafana.
-- Installs FluxCD to perform GitOps sync of a Git Repo to EKS Cluster. We will use this later for creating Grafana Dashboards and AWS datasources to Amazon Managed Grafana.
-- Installs External Secrets Operator to retrieve and Sync the Grafana API keys.
+- [Grafana Operator](https://github.com/grafana-operator/grafana-operator) and [Flux CD](https://fluxcd.io/) to manage Grafana contents (AWS data sources, Grafana Dashboards) with GitOps
+- External Secrets Operator to retrieve and Sync the Grafana API keys
 
 These modules can be directly configured in your existing Terraform
 configurations or ready to be deployed in our packaged
diff --git a/examples/eks-cluster-with-vpc/README.md b/examples/eks-cluster-with-vpc/README.md
@@ -8,93 +8,4 @@ This example deploys the following Basic EKS Cluster with VPC
 - Creates Internet gateway for Public Subnets and NAT Gateway for Private Subnets
 - Creates EKS Cluster Control plane with one managed node group
 
-## How to Deploy
-
-### Prerequisites
-
-Ensure that you have installed the following tools in your Mac or Windows Laptop before start working with this module and run Terraform Plan and Apply
-
-1. [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html)
-2. [Kubectl](https://Kubernetes.io/docs/tasks/tools/)
-3. [Terraform](https://learn.hashicorp.com/tutorials/terraform/install-cli)
-
-### Minimum IAM Policy
-
-> **Note**: The policy resource is set as `*` to allow all resources, this is not a recommended practice.
-
-You can find the policy [here](min-iam-policy.json)
-
-
-### Deployment Steps
-
-#### Step 1: Clone the repo using the command below
-
-```sh
-git clone https://github.com/aws-observability/terraform-aws-observability-accelerator.git
-```
-
-#### Step 2: Run Terraform INIT
-
-Initialize a working directory with configuration files
-
-```sh
-cd examples/eks-cluster-with-vpc/
-terraform init
-```
-
-#### Step 3: Run Terraform PLAN
-
-Verify the resources created by this execution
-
-```sh
-export TF_VAR_aws_region=<ENTER YOUR REGION>           # Select your own region
-export TF_VAR_cluster_name=<ENTER YOUR CLUSTER NAME>   # Enter your cluster name
-terraform plan
-```
-
-#### Step 4: Finally, Terraform APPLY
-
-**Deploy the pattern**
-
-```sh
-terraform apply
-```
-
-Enter `yes` to apply.
-
-### Configure `kubectl` and test cluster
-
-EKS Cluster details can be extracted from terraform output or from AWS Console to get the name of cluster.
-This following command used to update the `kubeconfig` in your local machine where you run kubectl commands to interact with your EKS Cluster.
-
-#### Step 5: Run `update-kubeconfig` command
-
-`~/.kube/config` file gets updated with cluster details and certificate from the below command
-
-    aws eks --region <enter-your-region> update-kubeconfig --name <cluster-name>
-
-#### Step 6: List all the worker nodes by running the command below
-
-    kubectl get nodes
-
-#### Step 7: List all the pods running in `kube-system` namespace
-
-    kubectl get pods -n kube-system
-
-## Cleanup
-
-To clean up your environment, destroy the Terraform modules in reverse order.
-
-Destroy the Kubernetes Add-ons, EKS cluster with Node groups and VPC
-
-```sh
-terraform destroy -target="module.eks_blueprints_kubernetes_addons" -auto-approve
-terraform destroy -target="module.eks_blueprints" -auto-approve
-terraform destroy -target="module.vpc" -auto-approve
-```
-
-Finally, destroy any additional resources that are not in the above modules
-
-```sh
-terraform destroy -auto-approve
-```
+You can view the full documentation for this example [here](https://aws-observability.github.io/terraform-aws-observability-accelerator/helpers/new-eks-cluster/)
diff --git a/examples/existing-cluster-with-base-and-infra/README.md b/examples/existing-cluster-with-base-and-infra/README.md
diff --git a/modules/eks-monitoring/README.md b/modules/eks-monitoring/README.md