Skip to content

Commit 8140023

Browse files
committed
first review of sentiment analysis Learning Path
1 parent 3e691fb commit 8140023

File tree

5 files changed

+117
-60
lines changed

5 files changed

+117
-60
lines changed
Lines changed: 33 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,33 +1,48 @@
11
---
2-
title: Cluster monitoring with Prometheus and Grafana in Amazon EKS
2+
title: Monitor the cluster with Prometheus and Grafana
33
weight: 5
44

55
### FIXED, DO NOT MODIFY
66
layout: learningpathall
77
---
88

9-
## CPU and RAM usage statistics with Prometheus and Grafana
9+
## Monitor CPU and RAM usage with Prometheus and Grafana
1010

11-
Prometheus is a monitoring and alerting tool. It is used for collecting and querying real-time metrics in cloud-native environments like Kubernetes. Prometheus collects essential metrics (e.g., CPU, memory usage, pod counts, request latency) that help in monitoring the health and performance of Kubernetes clusters. Grafana is a visualization and analytics tool that integrates with data sources from Prometheus, to create interactive dashboards to monitor and analyze Kubernetes metrics over time.
11+
Prometheus is a monitoring and alerting tool. It is used for collecting and querying real-time metrics in cloud-native environments like Kubernetes. Prometheus collects essential metrics about CPU usage, memory usage, pod counts, and request latency. This helps you monitor the health and performance of your Kubernetes clusters.
1212

13+
Grafana is a visualization and analytics tool that integrates with data sources from Prometheus to create interactive dashboards to monitor and analyze Kubernetes metrics over time.
1314

14-
## Install Prometheus on Arm-based EKS cluster
15+
## Install Prometheus on your EKS cluster
1516

16-
This learning path uses `helm` to install prometheus on the Kubernetes cluster. Follow the [helm documentation](https://helm.sh/docs/intro/install/) to install it on your laptop.
17+
You can use Helm to install prometheus on the Kubernetes cluster.
1718

18-
Create a namespace in your EKS cluster to host `prometheus` pods
19+
Follow the [Helm documentation](https://helm.sh/docs/intro/install/) to install it on your computer.
20+
21+
Confirm Helm is installed by running the version command:
22+
23+
```console
24+
helm version
25+
```
26+
27+
The output is similar to:
28+
29+
```output
30+
version.BuildInfo{Version:"v3.16.3", GitCommit:"cfd07493f46efc9debd9cc1b02a0961186df7fdf", GitTreeState:"clean", GoVersion:"go1.22.7"}
31+
```
32+
33+
Create a namespace in your EKS cluster to host `prometheus` pods:
1934

2035
```console
2136
kubectl create namespace prometheus
2237
```
2338

24-
Add the following helm repo for prometheus
39+
Add the following Helm repo for prometheus:
2540

2641
```console
2742
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
2843
```
2944

30-
Install `prometheus` on the cluster with the following command
45+
Install Prometheus on the cluster with the following command:
3146

3247
```console
3348
helm install prometheus prometheus-community/prometheus \
@@ -36,22 +51,21 @@ helm install prometheus prometheus-community/prometheus \
3651
--set server.persistentVolume.storageClass="gp2"
3752
```
3853

39-
Check all pods are up and running
54+
Check all pods are up and running:
4055

4156
```console
4257
kubectl get pods -n prometheus
4358
```
4459

60+
## Install Grafana on your EKS cluster
4561

46-
## Install Grafana on Arm-based EKS cluster
47-
48-
Add the following helm repo for grafana
62+
Add the following Helm repo for Grafana:
4963

5064
```console
5165
helm repo add grafana https://grafana.github.io/helm-charts
5266
```
5367

54-
Create `grafana.yaml` file with the following contents
68+
Use a text editor to create a `grafana.yaml` file with the following contents:
5569

5670
```console
5771
datasources:
@@ -65,13 +79,13 @@ datasources:
6579
isDefault: true
6680
```
6781

68-
Create another namespace for `grafana` pods
82+
Create another namespace for Grafana pods:
6983

7084
```console
7185
kubectl create namespace grafana
7286
```
7387

74-
Install `grafana` on the cluster with the following command
88+
Install Grafana on the cluster with the following command:
7589

7690
```console
7791
helm install grafana grafana/grafana \
@@ -82,12 +96,15 @@ helm install grafana grafana/grafana \
8296
--values grafana.yaml \
8397
--set service.type=LoadBalancer
8498
```
99+
85100
Check all pods are up and running
86101

87102
```console
88103
kubectl get pods -n grafana
89104
```
90105

91-
Login to the grafana dashboard using the LoadBalancer IP and click on `Dashboards` in the left navigation page. Locate a `Kubernetes / Compute Resources / Node` dashboard and click on it. You should see a dashboard like below for your Kubernetes cluster
106+
Login to the grafana dashboard using the LoadBalancer IP and click on `Dashboards` in the left navigation page. Locate a `Kubernetes / Compute Resources / Node` dashboard and click on it.
107+
108+
You see a dashboard like below for your Kubernetes cluster:
92109

93110
![grafana #center](_images/grafana.png)

content/learning-paths/servers-and-cloud-computing/sentiment-analysis-eks/Monitoring with Elasticsearch and Kibana.md

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Monitoring the sentiments with Elasticsearch and Kibana
2+
title: Monitoring sentiment with Elasticsearch and Kibana
33
weight: 4
44

55
### FIXED, DO NOT MODIFY
@@ -8,11 +8,13 @@ layout: learningpathall
88

99
## Deploy Elasticsearch and Kibana on Arm-based EC2 instance
1010

11-
Elasticsearch is a NoSQL database and search & analytics engine. It's designed to store, search and analyze large amounts of data. It has real-time indexing capability which is crucial for handling high-velocity data streams like tweets. Kibana is a dashboard and visualization tool that integrates seamlessly with Elasticsearch. It provides an interface to interact with twitter data, apply filters and receive alerts. There are multiple ways to install Elasticsearch and Kibana, one of the methods is shown below.
11+
Elasticsearch is a NoSQL database, search, and analytics engine. It's designed to store, search and analyze large amounts of data. It has real-time indexing capability which is crucial for handling high-velocity data streams like Tweets.
1212

13-
Before you begin, ensure that docker and docker compose have been installed on your laptop.
13+
Kibana is a dashboard and visualization tool that integrates seamlessly with Elasticsearch. It provides an interface to interact with twitter data, apply filters, and receive alerts. There are multiple ways to install Elasticsearch and Kibana, one method is shown below.
1414

15-
Create the following docker-compose.yml file
15+
Before you begin, ensure that Docker and Docker Compose have been installed on your computer.
16+
17+
Use a text editor to create a `docker-compose.yml` file with the contents below:
1618

1719
```yml
1820
version: '2.18.1'
@@ -47,15 +49,18 @@ networks:
4749
elk:
4850
driver: bridge
4951
```
52+
5053
Use the following command to deploy Elasticsearch and Kibana Dashboard.
5154
55+
```console
5256
docker-compose up
57+
```
5358

5459
After the dashboard is up, use the the public IP of your server on the port 5601 to access the Kibana dashboard.
5560

5661
![kibana #center](_images/kibana.png)
5762

58-
Now switch to the stack management using the menu on the left side as shown in below image.
63+
Switch to the stack management using the menu on the left side as shown in below image.
5964

6065
![kibana-data #center](_images/Kibana-data.png)
6166

@@ -71,7 +76,7 @@ One of the sample dashboard structures looks as below, showing the records of di
7176

7277
![kibana-dashboard2 #center](_images/Kibana-dashboard2.png)
7378

74-
Similarly, you can desgin and create dashboards to analyze a particular set of data. The screenshot below shows the dashboard designed for this learning path
79+
Similarly, you can design and create dashboards to analyze a particular set of data. The screenshot below shows the dashboard designed for this learning path
7580

7681
![kibana-dashboard3 #center](_images/Kibana-dashboard3.png)
7782

content/learning-paths/servers-and-cloud-computing/sentiment-analysis-eks/Sentiment Analysis.md

Lines changed: 47 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -8,20 +8,38 @@ layout: learningpathall
88

99
## Before you begin
1010

11-
You will need an [AWS account](https://aws.amazon.com/). Create an account if needed.
11+
You will need an [AWS account](https://docs.aws.amazon.com/accounts/latest/reference/manage-acct-creating.html). Create an account if needed.
1212

13-
Three tools are required on your local machine. Follow the links to install the required tools.
13+
Four tools are required on your local machine. Follow the links to install each tool.
1414

1515
* [Kubectl](/install-guides/kubectl/)
16-
* [AWS CLI](/install-guides/aws-cli)
17-
* [Docker](/install-guides/docker)
18-
* [Terraform](/install-guides/terraform)
16+
* [AWS CLI](/install-guides/aws-cli/)
17+
* [Docker](/install-guides/docker/)
18+
* [Terraform](/install-guides/terraform/)
19+
20+
To use the AWS CLI, you will need to generate AWS access keys and configure the CLI. Follow the [AWS Credentials](/install-guides/aws_access_keys/) install guide for instructions.
1921

2022
## Setup sentiment analysis
2123

22-
Clone this github [repository](https://github.com/koleini/spark-sentiment-analysis) on your local workstation. Navigate to `eks` directory and update the `variables.tf` file with your AWS region.
24+
Take a look at the [GitHub repository](https://github.com/koleini/spark-sentiment-analysis) then clone it on your local computer:
25+
26+
```console
27+
git clone https://github.com/koleini/spark-sentiment-analysis.git
28+
cd spark-sentiment-analysis
29+
```
30+
31+
Edit the file `eks/variables.tf` if you want to change the default AWS region.
32+
33+
The default value is at the top of the file and is set to `us-east-1`.
34+
35+
```output
36+
variable "AWS_region" {
37+
default = "us-east-1"
38+
description = "AWS region"
39+
}
40+
```
2341

24-
Execute the following commands to create the Amazon EKS cluster with pre-configured labels.
42+
Execute the following commands to create the Amazon EKS cluster:
2543

2644
```console
2745
terraform init
@@ -30,8 +48,10 @@ terraform apply --auto-approve
3048

3149
Update the `kubeconfig` file to access the deployed EKS cluster with the following command:
3250

51+
If you want to use an AWS CLI profile not named `default`, change the profile name before running the command.
52+
3353
```console
34-
aws eks --region $(terraform output -raw region) update-kubeconfig --name $(terraform output -raw cluster_name) --profile <AWS_PROFILE_NAME>
54+
aws eks --region $(terraform output -raw region) update-kubeconfig --name $(terraform output -raw cluster_name) --profile default
3555
```
3656

3757
Create a service account for Apache spark
@@ -43,24 +63,26 @@ kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount
4363

4464
## Build the sentiment analysis JAR file
4565

46-
Navigate to the `sentiment_analysis` folder and create a JAR file for the sentiment analyzer
66+
Navigate to the `sentiment_analysis` folder and create a JAR file for the sentiment analyzer:
4767

4868
```console
4969
cd sentiment_analysis
5070
sbt assembly
5171
```
5272

53-
You should see a JAR file created at the following location
73+
A JAR file is created at the following location:
5474

5575
```console
5676
sentiment_analysis/target/scala-2.13/bigdata-assembly-0.1.jar
5777
```
5878

59-
## Create Spark docker container image
79+
## Create a Spark container image
6080

6181
Create a repository in Amazon ECR to store the docker images. You can also use Docker Hub.
6282

63-
The Spark repository contains a script to build the Docker image needed for running inside the Kubernetes cluster. Execute this script on your Arm-based laptop to build the arm64 image.
83+
The Spark repository contains a script to build the container image you need to run inside the Kubernetes cluster.
84+
85+
Execute this script on your Arm-based computer to build the arm64 image.
6486

6587
In the current working directory, clone the `apache spark` github repository prior to building the image
6688

@@ -69,26 +91,29 @@ git clone https://github.com/apache/spark.git
6991
cd spark
7092
git checkout v3.4.3
7193
```
72-
Build the docker container using the following commands:
94+
95+
Build the docker container using the following commands. Substitute the name of your container repository before running the commands.
7396

7497
```console
7598
cp ../sentiment_analysis/target/scala-2.13/bigdata-assembly-0.1.jar jars/
7699
bin/docker-image-tool.sh -r <your-docker-repository> -t sentiment-analysis build
77100
bin/docker-image-tool.sh -r <your-docker-repository> -t sentiment-analysis push
78101
```
102+
79103
## Run Spark computation on the cluster
80104

81105
Execute the `spark-submit` command within the Spark folder to deploy the application. The following commands will run the application with two executors, each with 12 cores, and allocate 24GB of memory for both the executors and driver pods.
82106

83-
Set the following variables before executing the `spark-submit` command
107+
Set the following variables before executing the `spark-submit` command:
84108

85109
```console
86110
export MASTER_ADDRESS=<K8S_MASTER_ADDRESS>
87111
export ES_ADDRESS=<IP_ADDRESS_OF_ELASTICS_SEARCH>
88112
export CHECKPOINT_BUCKET=<BUCKET_NAME>
89113
export EKS_ADDRESS=<EKS_REGISTERY_ADDRESS>
90114
```
91-
Execute the following command
115+
116+
Execute the `spark-submit` command:
92117

93118
```console
94119
bin/spark-submit \
@@ -122,16 +147,20 @@ spark-twitter 1/1 Running 0 12m
122147

123148
## Twitter sentiment analysis
124149

125-
Create a twitter(X) [developer account](https://developer.x.com/en/docs/x-api/getting-started/getting-access-to-the-x-api) and create a `bearer token`. Using the following script to fetch the tweets
150+
Create a twitter(X) [developer account](https://developer.x.com/en/docs/x-api/getting-started/getting-access-to-the-x-api) and create a `bearer token`.
151+
152+
Use the following commands to set the token and fetch the Tweets:
126153

127154
```console
128155
export BEARER_TOKEN=<BEARER_TOKEN_FROM_X>
129156
python3 scripts/xapi_tweets.py
130157
```
131158

132-
You can modify the script `xapi_tweets.py` with your own keywords. Update the following section in the script to do so
159+
You can modify the script `xapi_tweets.py` with your own keywords.
133160

134-
```console
161+
Here is the code which includes the keywords:
162+
163+
```output
135164
query_params = {'query': "(#onArm OR @Arm OR #Arm OR #GenAI) -is:retweet lang:en",
136165
'tweet.fields': 'lang'}
137166
```
Lines changed: 15 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,30 @@
11
---
2-
title: What is Twitter Sentiment Analysis
2+
title: Understand sentiment analysis
33
weight: 2
44

55
### FIXED, DO NOT MODIFY
66
layout: learningpathall
77
---
88

9-
## What is Sentiment Analysis
9+
## What is sentiment analysis?
1010

11-
Sentiment analysis is a natural language processing technique used to identify and categorize opinions expressed in a piece of text, such as a tweet or a product review. It can help to gauge public opinion, identify trends and patterns, and improve decision-making. Social media platforms, such as Twitter, provide a wealth of information about public opinion, trends, and events. Sentiment analysis is important because it provides insights into how people feel about a particular topic or issue, and can help to identify emerging trends and patterns.
11+
Sentiment analysis is a natural language processing technique used to identify and categorize opinions expressed in a piece of text, such as a tweet or a product review. It can help gauge public opinion, identify trends and patterns, and improve decision-making. Social media platforms, such as Twitter (X), provide a wealth of information about public opinion, trends, and events. Sentiment analysis is important because it provides insights into how people feel about a particular topic or issue, and can help to identify emerging trends and patterns.
1212

13+
## Can I perform real-time sentiment analysis using an Arm-based Amazon EKS cluster?
1314

14-
## Real-time sentiment analysis with Arm-based Amazon EKS clusters
15+
Yes, you can use EKS for sentiment analysis.
1516

16-
Real-time sentiment analysis is a compute-intensive task and can quickly drive up resources and increase costs if not managed effectively. Tracking real-time changes enables organizations to understand sentiment patterns and make informed decisions promptly, allowing for timely and appropriate actions.
17+
Real-time sentiment analysis is a compute-intensive task and can quickly drive up resources and increase costs if not managed effectively. Tracking real-time changes enables you to understand sentiment patterns and make informed decisions promptly, allowing for timely and appropriate actions.
18+
19+
The architecture used for the solution is shown below:
1720

1821
![sentiment analysis #center](_images/Sentiment-Analysis.png)
1922

20-
The high-level technology stack for the solutions is as follows:
23+
The technology stack for the solution includes the following steps:
2124

22-
- Twitter(X) Developer API to fetch tweets based on certain keywords
23-
- Captured data is processed using Amazon Kinesis
24-
- Sentiment Analyzer model to classify the text and tone of tweets
25-
- Process the sentiment of tweets using Apache Spark streaming API
26-
- Elasticsearch and Kibana to store the processed tweets and showcase on dashboard
27-
- Prometheus and Grafana to monitor the CPU and RAM resources of the Amazon EKS cluster
25+
- Use the Twitter (X) developer API to fetch Tweets based on certain keywords
26+
- Process the captured data using Amazon Kinesis
27+
- Run a sentiment analysis model to classify the text and tone of the text
28+
- Process the sentiment of Tweets using Apache Spark streaming API
29+
- Use Elasticsearch and Kibana to store the processed Tweets and showcase the activity on a dashboard
30+
- Monitor the CPU and RAM resources of the Amazon EKS cluster with Prometheus and Grafana

content/learning-paths/servers-and-cloud-computing/sentiment-analysis-eks/_index.md

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,22 @@
11
---
2-
title: Learn how to perform Twitter(X) Sentiment Analysis on Arm-based EKS clusters
2+
title: Learn how to perform Twitter (X) sentiment analysis on Arm-based EKS clusters
3+
4+
draft: true
5+
cascade:
6+
draft: true
37

48
minutes_to_complete: 60
59

6-
who_is_this_for: This is an advanced topic for software developers who like to build an end-to-end solution ML solution to analyze the sentiments of live tweets with Arm-based Amazon EKS cluster
10+
who_is_this_for: This is an advanced topic for software developers who want to build an end-to-end ML sentiment analysis solution to analyze live Tweets on an Arm-based Amazon EKS cluster.
711

812
learning_objectives:
9-
- Deploy text classification model on Amazon EKS with Apache Spark
10-
- Learn how to deploy Elasticsearch and Kibana dashboard to analyze the tweets
11-
- Deploy Prometheus and Grafana dashboard to keep track of CPU and RAM usage of Kubernetes nodes
13+
- Deploy a text classification model on Amazon EKS with Apache Spark.
14+
- Use Elasticsearch and a Kibana dashboard to analyze the Tweets.
15+
- Deploy Prometheus and Grafana dashboards to keep track of CPU and RAM usage of Kubernetes nodes.
1216

1317
prerequisites:
14-
- An [AWS account](https://aws.amazon.com/). Create an account if needed.
15-
- A computer with [Amazon eksctl CLI](/install-guides/eksctl) and [kubectl](/install-guides/kubectl/)installed.
16-
- Docker installed on local computer [Docker](/install-guides/docker)
18+
- An AWS account.
19+
- A computer with Docker, Terraform, the Amazon eksctl CLI, and kubectl installed.
1720

1821
author_primary: Pranay Bakre, Masoud Koleini, Nobel Chowdary Mandepudi, Na Li
1922

0 commit comments

Comments
 (0)