Skip to content

Commit 3e691fb

Browse files
Merge pull request #1454 from pbk8s/main
Sentiment Analysis Learning Path
2 parents 313aafc + 2093816 commit 3e691fb

15 files changed

+427
-0
lines changed
Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
---
2+
title: Cluster monitoring with Prometheus and Grafana in Amazon EKS
3+
weight: 5
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## CPU and RAM usage statistics with Prometheus and Grafana
10+
11+
Prometheus is a monitoring and alerting tool. It is used for collecting and querying real-time metrics in cloud-native environments like Kubernetes. Prometheus collects essential metrics (e.g., CPU, memory usage, pod counts, request latency) that help in monitoring the health and performance of Kubernetes clusters. Grafana is a visualization and analytics tool that integrates with data sources from Prometheus, to create interactive dashboards to monitor and analyze Kubernetes metrics over time.
12+
13+
14+
## Install Prometheus on Arm-based EKS cluster
15+
16+
This learning path uses `helm` to install prometheus on the Kubernetes cluster. Follow the [helm documentation](https://helm.sh/docs/intro/install/) to install it on your laptop.
17+
18+
Create a namespace in your EKS cluster to host `prometheus` pods
19+
20+
```console
21+
kubectl create namespace prometheus
22+
```
23+
24+
Add the following helm repo for prometheus
25+
26+
```console
27+
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
28+
```
29+
30+
Install `prometheus` on the cluster with the following command
31+
32+
```console
33+
helm install prometheus prometheus-community/prometheus \
34+
--namespace prometheus \
35+
--set alertmanager.persistentVolume.storageClass="gp2" \
36+
--set server.persistentVolume.storageClass="gp2"
37+
```
38+
39+
Check all pods are up and running
40+
41+
```console
42+
kubectl get pods -n prometheus
43+
```
44+
45+
46+
## Install Grafana on Arm-based EKS cluster
47+
48+
Add the following helm repo for grafana
49+
50+
```console
51+
helm repo add grafana https://grafana.github.io/helm-charts
52+
```
53+
54+
Create `grafana.yaml` file with the following contents
55+
56+
```console
57+
datasources:
58+
datasources.yaml:
59+
apiVersion: 1
60+
datasources:
61+
- name: Prometheus
62+
type: prometheus
63+
url: http://prometheus-server.prometheus.svc.cluster.local
64+
access: proxy
65+
isDefault: true
66+
```
67+
68+
Create another namespace for `grafana` pods
69+
70+
```console
71+
kubectl create namespace grafana
72+
```
73+
74+
Install `grafana` on the cluster with the following command
75+
76+
```console
77+
helm install grafana grafana/grafana \
78+
--namespace grafana \
79+
--set persistence.storageClassName="gp2" \
80+
--set persistence.enabled=true \
81+
--set adminPassword=‘kubegrafana’ \
82+
--values grafana.yaml \
83+
--set service.type=LoadBalancer
84+
```
85+
Check all pods are up and running
86+
87+
```console
88+
kubectl get pods -n grafana
89+
```
90+
91+
Login to the grafana dashboard using the LoadBalancer IP and click on `Dashboards` in the left navigation page. Locate a `Kubernetes / Compute Resources / Node` dashboard and click on it. You should see a dashboard like below for your Kubernetes cluster
92+
93+
![grafana #center](_images/grafana.png)
Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
---
2+
title: Monitoring the sentiments with Elasticsearch and Kibana
3+
weight: 4
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## Deploy Elasticsearch and Kibana on Arm-based EC2 instance
10+
11+
Elasticsearch is a NoSQL database and search & analytics engine. It's designed to store, search and analyze large amounts of data. It has real-time indexing capability which is crucial for handling high-velocity data streams like tweets. Kibana is a dashboard and visualization tool that integrates seamlessly with Elasticsearch. It provides an interface to interact with twitter data, apply filters and receive alerts. There are multiple ways to install Elasticsearch and Kibana, one of the methods is shown below.
12+
13+
Before you begin, ensure that docker and docker compose have been installed on your laptop.
14+
15+
Create the following docker-compose.yml file
16+
17+
```yml
18+
version: '2.18.1'
19+
services:
20+
elasticsearch:
21+
image: elasticsearch:8.15.2
22+
container_name: elasticsearch
23+
environment:
24+
- discovery.type=single-node
25+
- ES_JAVA_OPTS=-Xms512m -Xmx512m
26+
- xpack.security.enabled=false
27+
- HTTP_ENABLE=true
28+
ports:
29+
- "9200:9200"
30+
networks:
31+
- elk
32+
33+
kibana:
34+
image: kibana:8.15.2
35+
container_name: kibana
36+
ports:
37+
- "5601:5601"
38+
environment:
39+
- ELASTICSEARCH_HOSTS=http://elasticsearch:9200
40+
- HTTP_ENABLE=true
41+
depends_on:
42+
- elasticsearch
43+
networks:
44+
- elk
45+
46+
networks:
47+
elk:
48+
driver: bridge
49+
```
50+
Use the following command to deploy Elasticsearch and Kibana Dashboard.
51+
52+
docker-compose up
53+
54+
After the dashboard is up, use the the public IP of your server on the port 5601 to access the Kibana dashboard.
55+
56+
![kibana #center](_images/kibana.png)
57+
58+
Now switch to the stack management using the menu on the left side as shown in below image.
59+
60+
![kibana-data #center](_images/Kibana-data.png)
61+
62+
To make sure that you are receiving the data from sentiment analysis application through Elasticsearch, check whether you have Data View in Stack Management.
63+
64+
![kibana-sentiment #center](_images/Kibana-sentiment.png)
65+
66+
You can also check the types of attributes that are received as the Data Views. Now, you can switch to the dashboards on the left menu and start creating the visualizations to analyze the data.
67+
68+
![kibana-dashboard1 #center](_images/Kibana-dashboard1.png)
69+
70+
One of the sample dashboard structures looks as below, showing the records of different sentiments.
71+
72+
![kibana-dashboard2 #center](_images/Kibana-dashboard2.png)
73+
74+
Similarly, you can desgin and create dashboards to analyze a particular set of data. The screenshot below shows the dashboard designed for this learning path
75+
76+
![kibana-dashboard3 #center](_images/Kibana-dashboard3.png)
77+
78+
Navigate to the `dashboards` directory in the cloned github repository and locate `sentiment_dashboard.ndjson` file. Import this file into Kibana dashboard and you should see a dashboard shown in previous step.
Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
---
2+
title: Cluster monitoring with Prometheus and Grafana in Amazon EKS
3+
weight: 3
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## Before you begin
10+
11+
You will need an [AWS account](https://aws.amazon.com/). Create an account if needed.
12+
13+
Three tools are required on your local machine. Follow the links to install the required tools.
14+
15+
* [Kubectl](/install-guides/kubectl/)
16+
* [AWS CLI](/install-guides/aws-cli)
17+
* [Docker](/install-guides/docker)
18+
* [Terraform](/install-guides/terraform)
19+
20+
## Setup sentiment analysis
21+
22+
Clone this github [repository](https://github.com/koleini/spark-sentiment-analysis) on your local workstation. Navigate to `eks` directory and update the `variables.tf` file with your AWS region.
23+
24+
Execute the following commands to create the Amazon EKS cluster with pre-configured labels.
25+
26+
```console
27+
terraform init
28+
terraform apply --auto-approve
29+
```
30+
31+
Update the `kubeconfig` file to access the deployed EKS cluster with the following command:
32+
33+
```console
34+
aws eks --region $(terraform output -raw region) update-kubeconfig --name $(terraform output -raw cluster_name) --profile <AWS_PROFILE_NAME>
35+
```
36+
37+
Create a service account for Apache spark
38+
39+
```console
40+
kubectl create serviceaccount spark
41+
kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=default:spark --namespace=default
42+
```
43+
44+
## Build the sentiment analysis JAR file
45+
46+
Navigate to the `sentiment_analysis` folder and create a JAR file for the sentiment analyzer
47+
48+
```console
49+
cd sentiment_analysis
50+
sbt assembly
51+
```
52+
53+
You should see a JAR file created at the following location
54+
55+
```console
56+
sentiment_analysis/target/scala-2.13/bigdata-assembly-0.1.jar
57+
```
58+
59+
## Create Spark docker container image
60+
61+
Create a repository in Amazon ECR to store the docker images. You can also use Docker Hub.
62+
63+
The Spark repository contains a script to build the Docker image needed for running inside the Kubernetes cluster. Execute this script on your Arm-based laptop to build the arm64 image.
64+
65+
In the current working directory, clone the `apache spark` github repository prior to building the image
66+
67+
```console
68+
git clone https://github.com/apache/spark.git
69+
cd spark
70+
git checkout v3.4.3
71+
```
72+
Build the docker container using the following commands:
73+
74+
```console
75+
cp ../sentiment_analysis/target/scala-2.13/bigdata-assembly-0.1.jar jars/
76+
bin/docker-image-tool.sh -r <your-docker-repository> -t sentiment-analysis build
77+
bin/docker-image-tool.sh -r <your-docker-repository> -t sentiment-analysis push
78+
```
79+
## Run Spark computation on the cluster
80+
81+
Execute the `spark-submit` command within the Spark folder to deploy the application. The following commands will run the application with two executors, each with 12 cores, and allocate 24GB of memory for both the executors and driver pods.
82+
83+
Set the following variables before executing the `spark-submit` command
84+
85+
```console
86+
export MASTER_ADDRESS=<K8S_MASTER_ADDRESS>
87+
export ES_ADDRESS=<IP_ADDRESS_OF_ELASTICS_SEARCH>
88+
export CHECKPOINT_BUCKET=<BUCKET_NAME>
89+
export EKS_ADDRESS=<EKS_REGISTERY_ADDRESS>
90+
```
91+
Execute the following command
92+
93+
```console
94+
bin/spark-submit \
95+
--class bigdata.SentimentAnalysis \
96+
--master k8s://$MASTER_ADDRESS:443 \
97+
--deploy-mode cluster \
98+
--conf spark.executor.instances=2 \
99+
--conf spark.kubernetes.container.image=532275579171.dkr.ecr.us-east-1.amazonaws.com/spark:sentiment-analysis \
100+
--conf spark.kubernetes.driver.pod.name="spark-twitter" \
101+
--conf spark.kubernetes.namespace=default \
102+
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
103+
--conf spark.driver.extraJavaOptions="-DES_NODES=4$ES_ADDRESS -DCHECKPOINT_LOCATION=s3a://$CHECKPOINT_BUCKET/checkpoints/" \
104+
--conf spark.executor.extraJavaOptions="-DES_NODES=$ES_ADDRESS -DCHECKPOINT_LOCATION=s3a://$CHECKPOINT_BUCKET/checkpoints/" \
105+
--conf spark.executor.cores=12 \
106+
--conf spark.driver.cores=12 \
107+
--conf spark.driver.memory=24g \
108+
--conf spark.executor.memory=24g \
109+
--conf spark.memory.fraction=0.8 \
110+
--name sparkTwitter \
111+
local:///opt/spark/jars/bigdata-assembly-0.1.jar
112+
```
113+
114+
Use `kubectl get pods` to check the status of the pods in the cluster.
115+
116+
```output
117+
NAME READY STATUS RESTARTS AGE
118+
sentimentanalysis-346f22932b484903-exec-1 1/1 Running 0 10m
119+
sentimentanalysis-346f22932b484903-exec-2 1/1 Running 0 10m
120+
spark-twitter 1/1 Running 0 12m
121+
```
122+
123+
## Twitter sentiment analysis
124+
125+
Create a twitter(X) [developer account](https://developer.x.com/en/docs/x-api/getting-started/getting-access-to-the-x-api) and create a `bearer token`. Using the following script to fetch the tweets
126+
127+
```console
128+
export BEARER_TOKEN=<BEARER_TOKEN_FROM_X>
129+
python3 scripts/xapi_tweets.py
130+
```
131+
132+
You can modify the script `xapi_tweets.py` with your own keywords. Update the following section in the script to do so
133+
134+
```console
135+
query_params = {'query': "(#onArm OR @Arm OR #Arm OR #GenAI) -is:retweet lang:en",
136+
'tweet.fields': 'lang'}
137+
```
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
---
2+
title: What is Twitter Sentiment Analysis
3+
weight: 2
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## What is Sentiment Analysis
10+
11+
Sentiment analysis is a natural language processing technique used to identify and categorize opinions expressed in a piece of text, such as a tweet or a product review. It can help to gauge public opinion, identify trends and patterns, and improve decision-making. Social media platforms, such as Twitter, provide a wealth of information about public opinion, trends, and events. Sentiment analysis is important because it provides insights into how people feel about a particular topic or issue, and can help to identify emerging trends and patterns.
12+
13+
14+
## Real-time sentiment analysis with Arm-based Amazon EKS clusters
15+
16+
Real-time sentiment analysis is a compute-intensive task and can quickly drive up resources and increase costs if not managed effectively. Tracking real-time changes enables organizations to understand sentiment patterns and make informed decisions promptly, allowing for timely and appropriate actions.
17+
18+
![sentiment analysis #center](_images/Sentiment-Analysis.png)
19+
20+
The high-level technology stack for the solutions is as follows:
21+
22+
- Twitter(X) Developer API to fetch tweets based on certain keywords
23+
- Captured data is processed using Amazon Kinesis
24+
- Sentiment Analyzer model to classify the text and tone of tweets
25+
- Process the sentiment of tweets using Apache Spark streaming API
26+
- Elasticsearch and Kibana to store the processed tweets and showcase on dashboard
27+
- Prometheus and Grafana to monitor the CPU and RAM resources of the Amazon EKS cluster
87.3 KB
Loading
98.7 KB
Loading
91.5 KB
Loading
144 KB
Loading
75.4 KB
Loading
53.2 KB
Loading

0 commit comments

Comments
 (0)