Skip to content

Commit 59ea52d

Browse files
authored
Merge pull request #36 from zjaco13/fix-dashboards
Add graviton pattern
2 parents fbb9dcc + 822567b commit 59ea52d

File tree

11 files changed

+432
-6
lines changed

11 files changed

+432
-6
lines changed
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
import { configureApp } from '../lib/common/construct-utils';
2+
import SingleNewEksGravitonOpenSourceObservabilityConstruct from '../lib/single-new-eks-opensource-observability-construct/graviton-index';
3+
4+
const app = configureApp();
5+
6+
new SingleNewEksGravitonOpenSourceObservabilityConstruct(app, 'single-new-eks-graviton-opensource');

bin/single-new-eks-opensource-observability.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,4 @@ import { configureApp } from '../lib/common/construct-utils';
33

44
const app = configureApp();
55

6-
new SingleNewEksOpenSourceobservabilityConstruct(app, 'single-new-eks-opensource');
6+
new SingleNewEksOpenSourceobservabilityConstruct(app, 'single-new-eks-opensource');
757 KB
Loading
757 KB
Loading
Lines changed: 337 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,337 @@
1+
# Single New EKS Graviton Cluster Open Source Observability Accelerator
2+
3+
## Architecture
4+
5+
The following figure illustrates the architecture of the pattern we will be deploying for Single EKS Cluster Open Source Observability on Graviton pattern using open source tooling such as AWS Distro for Open Telemetry (ADOT), Amazon Managed Service for Prometheus (AMP), Amazon Managed Grafana :
6+
7+
![Architecture](../images/CDK_Architecture_graviton_diagram.png)
8+
9+
Monitoring Amazon Elastic Kubernetes Service (Amazon EKS) for metrics has two categories:
10+
the control plane and the Amazon EKS nodes (with Kubernetes objects).
11+
The Amazon EKS control plane consists of control plane nodes that run the Kubernetes software,
12+
such as etcd and the Kubernetes API server. To read more on the components of an Amazon EKS cluster,
13+
please read the [service documentation](https://docs.aws.amazon.com/eks/latest/userguide/clusters.html).
14+
15+
### Graviton
16+
17+
[AWS Graviton](https://aws.amazon.com/ec2/graviton/) Processors are designed by AWS to deliver the best price to performance for your cloud workloads running in Amazon EC2. These processors are ARM chips running on aarch64 architecture. These processors feature key capabilities, such as the [AWS Nitro System](https://aws.amazon.com/ec2/nitro/), that allow you to securely run cloud native applications at scale.
18+
19+
Visit our [EKS Blueprints docs](https://github.com/aws-quickstart/cdk-eks-blueprints/blob/main/docs/addons/index.md) for a list of supported addons on Graviton.
20+
21+
## Objective
22+
23+
- Deploys one production grade Amazon EKS cluster running on a Graviton3 Processor
24+
- AWS Distro For OpenTelemetry Operator and Collector for Metrics and Traces
25+
- Logs with [AWS for FluentBit](https://github.com/aws/aws-for-fluent-bit)
26+
- Installs Grafana Operator to add AWS data sources and create Grafana Dashboards to Amazon Managed Grafana.
27+
- Installs FluxCD to perform GitOps sync of a Git Repo to EKS Cluster. We will use this later for creating Grafana Dashboards and AWS datasources to Amazon Managed Grafana. You can also use your own GitRepo to sync your own Grafana resources such as Dashboards, Datasources etc. Please check our One observability module - [GitOps with Amazon Managed Grafana](https://catalog.workshops.aws/observability/en-US/aws-managed-oss/gitops-with-amg) to learn more about this.
28+
- Installs External Secrets Operator to retrieve and Sync the Grafana API keys.
29+
- Amazon Managed Grafana Dashboard and data source
30+
- Alerts and recording rules with AWS Managed Service for Prometheus
31+
32+
## Prerequisites:
33+
34+
Ensure that you have installed the following tools on your machine.
35+
36+
1. [aws cli](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html)
37+
2. [kubectl](https://Kubernetes.io/docs/tasks/tools/)
38+
3. [cdk](https://docs.aws.amazon.com/cdk/v2/guide/getting_started.html#getting_started_install)
39+
4. [npm](https://docs.npmjs.com/cli/v8/commands/npm-install)
40+
41+
## Deploying
42+
43+
1. Clone your forked repository
44+
45+
```sh
46+
git clone https://github.com/aws-observability/cdk-aws-observability-accelerator.git
47+
```
48+
49+
2. Install the AWS CDK Toolkit globally on your machine using
50+
51+
```bash
52+
npm install -g aws-cdk
53+
```
54+
55+
3. Amazon Managed Grafana workspace: To visualize metrics collected, you need an Amazon Managed Grafana workspace. If you have an existing workspace, create an environment variable as described below. To create a new workspace, visit [our supporting example for Grafana](https://aws-observability.github.io/terraform-aws-observability-accelerator/helpers/managed-grafana/)
56+
57+
!!! note
58+
For the URL `https://g-xyz.grafana-workspace.us-east-1.amazonaws.com`, the workspace ID would be `g-xyz`
59+
60+
```bash
61+
export AWS_REGION=<YOUR AWS REGION>
62+
export COA_AMG_WORKSPACE_ID=g-xxx
63+
export COA_AMG_ENDPOINT_URL=https://g-xyz.grafana-workspace.us-east-1.amazonaws.com
64+
```
65+
66+
!!! warning
67+
Setting up environment variables `COA_AMG_ENDPOINT_URL` and `AWS_REGION` is mandatory for successful execution of this pattern.
68+
69+
4. GRAFANA API KEY: Amazon Managed Grafana provides a control plane API for generating Grafana API keys.
70+
71+
```bash
72+
export AMG_API_KEY=$(aws grafana create-workspace-api-key \
73+
--key-name "grafana-operator-key" \
74+
--key-role "ADMIN" \
75+
--seconds-to-live 432000 \
76+
--workspace-id $COA_AMG_WORKSPACE_ID \
77+
--query key \
78+
--output text)
79+
```
80+
81+
5. AWS Secrets Manager for GRAFANA API KEY: Update the Grafana API key secret in AWS Secrets using the above new Grafana API key. This will be referenced by Grafana Operator deployment of our solution to access Amazon Managed Grafana from Amazon EKS Cluster
82+
83+
```bash
84+
aws secretsmanager create-secret \
85+
--name grafana-api-key \
86+
--description "API Key of your Grafana Instance" \
87+
--secret-string "${AMG_API_KEY}" \
88+
--region $AWS_REGION \
89+
--query ARN \
90+
--output text
91+
```
92+
93+
6. Install project dependencies by running `npm install` in the main folder of this cloned repository.
94+
95+
7. The actual settings for dashboard urls are expected to be specified in the CDK context. Generically it is inside the cdk.json file of the current directory or in `~/.cdk.json` in your home directory.
96+
97+
Example settings: Update the context in `cdk.json` file located in `cdk-eks-blueprints-patterns` directory
98+
99+
```
100+
"context": {
101+
"cluster.dashboard.url": "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/cluster.json",
102+
"kubelet.dashboard.url": "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/kubelet.json",
103+
"namespaceworkloads.dashboard.url": "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/namespace-workloads.json",
104+
"nodeexporter.dashboard.url": "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/nodeexporter-nodes.json",
105+
"nodes.dashboard.url": "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/nodes.json",
106+
"workloads.dashboard.url": "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/workloads.json"
107+
}
108+
```
109+
110+
8. Once all pre-requisites are set you are ready to deploy the pipeline. Run the following command from the root of this repository to deploy the pipeline stack:
111+
112+
```bash
113+
make build
114+
make pattern single-new-eks-graviton-opensource-observability deploy
115+
```
116+
117+
## Verify the resources
118+
119+
Run update-kubeconfig command. You should be able to get the command from CDK output message.
120+
121+
```bash
122+
aws eks update-kubeconfig --name single-new-eks-graviton-opensource-observability-accelerator --region <your region> --role-arn arn:aws:iam::xxxxxxxxx:role/single-new-eks-gravitonop-singleneweksgravitonopens-82N8N3BMJYYI
123+
```
124+
125+
Let’s verify the resources created by steps above.
126+
127+
```bash
128+
kubectl get nodes -o wide
129+
```
130+
Output:
131+
132+
```console
133+
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
134+
ip-10-0-104-200.us-west-2.compute.internal Ready <none> 2d1h v1.27.1-eks-2f008fe 10.0.104.200 <none> Amazon Linux 2 5.10.179-168.710.amzn2.aarch64 containerd://1.6.19
135+
```
136+
137+
Next, lets verify the namespaces in the cluster:
138+
139+
```bash
140+
kubectl get ns # Output shows all namespace
141+
```
142+
143+
Output:
144+
145+
```console
146+
NAME STATUS AGE
147+
cert-manager Active 2d1h
148+
default Active 2d1h
149+
external-secrets Active 2d1h
150+
flux-system Active 2d1h
151+
grafana-operator Active 2d1h
152+
kube-node-lease Active 2d1h
153+
kube-public Active 2d1h
154+
kube-system Active 2d1h
155+
opentelemetry-operator-system Active 2d1h
156+
prometheus-node-exporter Active 2d1h
157+
```
158+
159+
Next, lets verify all resources of `grafana-operator` namespace:
160+
161+
```bash
162+
kubectl get all --namespace=grafana-operator
163+
```
164+
165+
Output:
166+
167+
```console
168+
NAME READY STATUS RESTARTS AGE
169+
pod/grafana-operator-866d4446bb-g5srl 1/1 Running 0 2d1h
170+
171+
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
172+
service/grafana-operator-metrics-service ClusterIP 172.20.223.125 <none> 9090/TCP 2d1h
173+
174+
NAME READY UP-TO-DATE AVAILABLE AGE
175+
deployment.apps/grafana-operator 1/1 1 1 2d1h
176+
177+
NAME DESIRED CURRENT READY AGE
178+
replicaset.apps/grafana-operator-866d4446bb 1 1 1 2d1h
179+
```
180+
181+
## Visualization
182+
183+
#### 1. Grafana dashboards
184+
185+
Login to your Grafana workspace and navigate to the Dashboards panel. You should see a list of dashboards under the `Observability Accelerator Dashboards`
186+
187+
![Dashboard](../images/All-Dashboards.png)
188+
189+
Open the `Node Exporter` dashboard and you should be able to view its visualization as shown below :
190+
191+
![NodeExporter_Dashboard](../images/Node-Exporter.png)
192+
193+
194+
Open the `Kubelet` dashboard and you should be able to view its visualization as shown below :
195+
196+
![Kubelet_Dashboard](../images/Kubelet.png)
197+
198+
From the cluster to view all dashboards as Kubernetes objects, run:
199+
200+
```bash
201+
kubectl get grafanadashboards -A
202+
```
203+
204+
```console
205+
NAMESPACE NAME AGE
206+
grafana-operator cluster-grafanadashboard 138m
207+
grafana-operator java-grafanadashboard 143m
208+
grafana-operator kubelet-grafanadashboard 13h
209+
grafana-operator namespace-workloads-grafanadashboard 13h
210+
grafana-operator nginx-grafanadashboard 134m
211+
grafana-operator node-exporter-grafanadashboard 13h
212+
grafana-operator nodes-grafanadashboard 13h
213+
grafana-operator workloads-grafanadashboard 13h
214+
```
215+
216+
You can inspect more details per dashboard using this command
217+
218+
```bash
219+
kubectl describe grafanadashboards cluster-grafanadashboard -n grafana-operator
220+
```
221+
222+
Grafana Operator and Flux always work together to synchronize your dashboards with Git. If you delete your dashboards by accident, they will be re-provisioned automatically.
223+
224+
## Viewing Logs
225+
226+
By default, we deploy a FluentBit daemon set in the cluster to collect worker logs for all namespaces. Logs are collected and exported to Amazon CloudWatch Logs, which enables you to centralize the logs from all of your systems, applications,
227+
and AWS services that you use, in a single, highly scalable service.
228+
229+
## Using CloudWatch Logs as data source in Grafana
230+
231+
Follow [the documentation](https://docs.aws.amazon.com/grafana/latest/userguide/using-amazon-cloudwatch-in-AMG.html)
232+
to enable Amazon CloudWatch as a data source. Make sure to provide permissions.
233+
234+
All logs are delivered in the following CloudWatch Log groups naming pattern:
235+
`/aws/eks/single-new-eks-opensource-observability-accelerator`.
236+
Log streams follow `{container-name}.{pod-name}`. In Grafana, querying and analyzing logs is done with [CloudWatch Logs Insights](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/AnalyzingLogData.html)
237+
238+
### Example - ADOT collector logs
239+
240+
Select one or many log groups and run the following query. The example below,
241+
queries AWS Distro for OpenTelemetry (ADOT) logs
242+
243+
```console
244+
fields @timestamp, log
245+
| order @timestamp desc
246+
| limit 100
247+
```
248+
249+
![logs-1](../images/logs-1.png)
250+
251+
### Example - Using time series visualizations
252+
253+
[CloudWatch Logs syntax](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/CWL_QuerySyntax.html)
254+
provide powerful functions to extract data from your logs. The `stats()`
255+
function allows you to calculate aggregate statistics with log field values.
256+
This is useful to have visualization on non-metric data from your applications.
257+
258+
In the example below, we use the following query to graph the number of metrics
259+
collected by the ADOT collector
260+
261+
```console
262+
fields @timestamp, log
263+
| parse log /"#metrics": (?<metrics_count>\d+)}/
264+
| stats avg(metrics_count) by bin(5m)
265+
| limit 100
266+
```
267+
268+
!!! tip
269+
You can add logs in your dashboards with logs panel types or time series
270+
depending on your query results type.
271+
272+
![logs-2](../images/logs-2.png)
273+
274+
!!! warning
275+
Querying CloudWatch logs will incur costs per GB scanned. Use small time
276+
windows and limits in your queries. Checkout the CloudWatch
277+
[pricing page](https://aws.amazon.com/cloudwatch/pricing/) for more infos.
278+
279+
## Troubleshooting
280+
281+
### 1. Grafana dashboards missing or Grafana API key expired
282+
283+
In case you don't see the grafana dashboards in your Amazon Managed Grafana console, check on the logs on your grafana operator pod using the below command :
284+
285+
```bash
286+
kubectl get pods -n grafana-operator
287+
```
288+
289+
Output:
290+
291+
```console
292+
NAME READY STATUS RESTARTS AGE
293+
grafana-operator-866d4446bb-nqq5c 1/1 Running 0 3h17m
294+
```
295+
296+
```bash
297+
kubectl logs grafana-operator-866d4446bb-nqq5c -n grafana-operator
298+
```
299+
300+
Output:
301+
302+
```console
303+
1.6857285045556655e+09 ERROR error reconciling datasource {"controller": "grafanadatasource", "controllerGroup": "grafana.integreatly.org", "controllerKind": "GrafanaDatasource", "GrafanaDatasource": {"name":"grafanadatasource-sample-amp","namespace":"grafana-operator"}, "namespace": "grafana-operator", "name": "grafanadatasource-sample-amp", "reconcileID": "72cfd60c-a255-44a1-bfbd-88b0cbc4f90c", "datasource": "grafanadatasource-sample-amp", "grafana": "external-grafana", "error": "status: 401, body: {\"message\":\"Expired API key\"}\n"}
304+
github.com/grafana-operator/grafana-operator/controllers.(*GrafanaDatasourceReconciler).Reconcile
305+
```
306+
307+
If you observe, the the above `grafana-api-key error` in the logs, your grafana API key is expired. Please use the operational procedure to update your `grafana-api-key` :
308+
309+
- First, lets create a new Grafana API key.
310+
311+
```bash
312+
export GO_AMG_API_KEY=$(aws grafana create-workspace-api-key \
313+
--key-name "grafana-operator-key-new" \
314+
--key-role "ADMIN" \
315+
--seconds-to-live 432000 \
316+
--workspace-id $COA_AMG_WORKSPACE_ID \
317+
--query key \
318+
--output text)
319+
```
320+
321+
- Finally, update the Grafana API key secret in AWS Secrets Manager using the above new Grafana API key:
322+
323+
```bash
324+
export API_KEY_SECRET_NAME="grafana-api-key"
325+
aws secretsmanager update-secret \
326+
--secret-id $API_KEY_SECRET_NAME \
327+
--secret-string "${AMG_API_KEY}" \
328+
--region $AWS_REGION
329+
```
330+
331+
- If the issue persists, you can force the synchronization by deleting the `externalsecret` Kubernetes object.
332+
333+
```bash
334+
kubectl delete externalsecret/external-secrets-sm -n grafana-operator
335+
```
336+
337+

docs/patterns/single-new-eks-observability-accelerators/single-new-eks-opensource-observability.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Singe New EKS Cluster Open Source Observability Accelerator
1+
# Single New EKS Cluster Open Source Observability Accelerator
22

33
## Architecture
44

@@ -95,7 +95,7 @@ Example settings: Update the context in `cdk.json` file located in `cdk-eks-blue
9595
"cluster.dashboard.url": "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/cluster.json",
9696
"kubelet.dashboard.url": "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/kubelet.json",
9797
"namespaceworkloads.dashboard.url": "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/namespace-workloads.json",
98-
"nodexporter.dashboard.url": "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/nodeexporter-nodes.json",
98+
"nodeexporter.dashboard.url": "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/nodeexporter-nodes.json",
9999
"nodes.dashboard.url": "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/nodes.json",
100100
"workloads.dashboard.url": "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/workloads.json"
101101
}

lib/common/observability-builder.ts

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,6 @@ export class ObservabilityBuilder {
1717
new blueprints.addons.AwsLoadBalancerControllerAddOn(),
1818
new blueprints.addons.VpcCniAddOn(),
1919
new blueprints.addons.CoreDnsAddOn(),
20-
new blueprints.addons.KubeProxyAddOn(),
2120
new blueprints.addons.MetricsServerAddOn(),
2221
new blueprints.addons.ExternalsSecretsAddOn(),
2322
new blueprints.addons.CertManagerAddOn(),
@@ -44,4 +43,4 @@ export class UsageTrackingAddOn extends NestedStack {
4443
constructor(scope: Construct, id: string, props: NestedStackProps) {
4544
super(scope, id, utils.withUsageTracking(UsageTrackingAddOn.USAGE_ID, props));
4645
}
47-
}
46+
}

lib/single-new-eks-awsnative-observability-construct/index.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ export default class SingleNewEksClusterAWSNativeobservabilityConstruct {
1111
const region = process.env.COA_AWS_REGION! || process.env.CDK_DEFAULT_REGION!;
1212

1313
const addOns: Array<blueprints.ClusterAddOn> = [
14+
new blueprints.addons.KubeProxyAddOn(),
1415
new blueprints.addons.CloudWatchLogsAddon({
1516
logGroupPrefix: `/aws/eks/${stackId}`,
1617
logRetentionDays: 30

lib/single-new-eks-mixed-observability-construct/index.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ export default class SingleNewEksMixedobservabilityConstruct {
1919
});
2020

2121
const addOns: Array<blueprints.ClusterAddOn> = [
22+
new blueprints.addons.KubeProxyAddOn(),
2223
new blueprints.addons.CloudWatchLogsAddon({
2324
logGroupPrefix: `/aws/eks/${stackId}`,
2425
logRetentionDays: 30

0 commit comments

Comments
 (0)