|
| 1 | +--- |
| 2 | +title: Cluster monitoring with Prometheus and Grafana in Amazon EKS |
| 3 | +weight: 3 |
| 4 | + |
| 5 | +### FIXED, DO NOT MODIFY |
| 6 | +layout: learningpathall |
| 7 | +--- |
| 8 | + |
| 9 | +## Before you begin |
| 10 | + |
| 11 | +You will need an [AWS account](https://aws.amazon.com/). Create an account if needed. |
| 12 | + |
| 13 | +Three tools are required on your local machine. Follow the links to install the required tools. |
| 14 | + |
| 15 | +* [Kubectl](/install-guides/kubectl/) |
| 16 | +* [AWS CLI](/install-guides/aws-cli) |
| 17 | +* [Docker](/install-guides/docker) |
| 18 | +* [Terraform](/install-guides/terraform) |
| 19 | + |
| 20 | +## Setup sentiment analysis |
| 21 | + |
| 22 | +Clone this github [repository](https://github.com/koleini/spark-sentiment-analysis) on your local workstation. Navigate to `eks` directory and update the `variables.tf` file with your AWS region. |
| 23 | + |
| 24 | +Execute the following commands to create the Amazon EKS cluster with pre-configured labels. |
| 25 | + |
| 26 | +```console |
| 27 | +terraform init |
| 28 | +terraform apply --auto-approve |
| 29 | +``` |
| 30 | + |
| 31 | +Update the `kubeconfig` file to access the deployed EKS cluster with the following command: |
| 32 | + |
| 33 | +```console |
| 34 | +aws eks --region $(terraform output -raw region) update-kubeconfig --name $(terraform output -raw cluster_name) --profile <AWS_PROFILE_NAME> |
| 35 | +``` |
| 36 | + |
| 37 | +Create a service account for Apache spark |
| 38 | + |
| 39 | +```console |
| 40 | +kubectl create serviceaccount spark |
| 41 | +kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=default:spark --namespace=default |
| 42 | +``` |
| 43 | + |
| 44 | +## Build the sentiment analysis JAR file |
| 45 | + |
| 46 | +Navigate to the `sentiment_analysis` folder and create a JAR file for the sentiment analyzer |
| 47 | + |
| 48 | +```console |
| 49 | +cd sentiment_analysis |
| 50 | +sbt assembly |
| 51 | +``` |
| 52 | + |
| 53 | +You should see a JAR file created at the following location |
| 54 | + |
| 55 | +```console |
| 56 | +sentiment_analysis/target/scala-2.13/bigdata-assembly-0.1.jar |
| 57 | +``` |
| 58 | + |
| 59 | +## Create Spark docker container image |
| 60 | + |
| 61 | +Create a repository in Amazon ECR to store the docker images. You can also use Docker Hub. |
| 62 | + |
| 63 | +The Spark repository contains a script to build the Docker image needed for running inside the Kubernetes cluster. Execute this script on your Arm-based laptop to build the arm64 image. |
| 64 | + |
| 65 | +In the current working directory, clone the `apache spark` github repository prior to building the image |
| 66 | + |
| 67 | +```console |
| 68 | +git clone https://github.com/apache/spark.git |
| 69 | +cd spark |
| 70 | +git checkout v3.4.3 |
| 71 | +``` |
| 72 | +Build the docker container using the following commands: |
| 73 | + |
| 74 | +```console |
| 75 | +cp ../sentiment_analysis/target/scala-2.13/bigdata-assembly-0.1.jar jars/ |
| 76 | +bin/docker-image-tool.sh -r <your-docker-repository> -t sentiment-analysis build |
| 77 | +bin/docker-image-tool.sh -r <your-docker-repository> -t sentiment-analysis push |
| 78 | +``` |
| 79 | +## Run Spark computation on the cluster |
| 80 | + |
| 81 | +Execute the `spark-submit` command within the Spark folder to deploy the application. The following commands will run the application with two executors, each with 12 cores, and allocate 24GB of memory for both the executors and driver pods. |
| 82 | + |
| 83 | +Set the following variables before executing the `spark-submit` command |
| 84 | + |
| 85 | +```console |
| 86 | +export MASTER_ADDRESS=<K8S_MASTER_ADDRESS> |
| 87 | +export ES_ADDRESS=<IP_ADDRESS_OF_ELASTICS_SEARCH> |
| 88 | +export CHECKPOINT_BUCKET=<BUCKET_NAME> |
| 89 | +export EKS_ADDRESS=<EKS_REGISTERY_ADDRESS> |
| 90 | +``` |
| 91 | +Execute the following command |
| 92 | + |
| 93 | +```console |
| 94 | +bin/spark-submit \ |
| 95 | + --class bigdata.SentimentAnalysis \ |
| 96 | + --master k8s://$MASTER_ADDRESS:443 \ |
| 97 | + --deploy-mode cluster \ |
| 98 | + --conf spark.executor.instances=2 \ |
| 99 | + --conf spark.kubernetes.container.image=532275579171.dkr.ecr.us-east-1.amazonaws.com/spark:sentiment-analysis \ |
| 100 | + --conf spark.kubernetes.driver.pod.name="spark-twitter" \ |
| 101 | + --conf spark.kubernetes.namespace=default \ |
| 102 | + --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \ |
| 103 | + --conf spark.driver.extraJavaOptions="-DES_NODES=4$ES_ADDRESS -DCHECKPOINT_LOCATION=s3a://$CHECKPOINT_BUCKET/checkpoints/" \ |
| 104 | + --conf spark.executor.extraJavaOptions="-DES_NODES=$ES_ADDRESS -DCHECKPOINT_LOCATION=s3a://$CHECKPOINT_BUCKET/checkpoints/" \ |
| 105 | + --conf spark.executor.cores=12 \ |
| 106 | + --conf spark.driver.cores=12 \ |
| 107 | + --conf spark.driver.memory=24g \ |
| 108 | + --conf spark.executor.memory=24g \ |
| 109 | + --conf spark.memory.fraction=0.8 \ |
| 110 | + --name sparkTwitter \ |
| 111 | + local:///opt/spark/jars/bigdata-assembly-0.1.jar |
| 112 | +``` |
| 113 | + |
| 114 | +Use `kubectl get pods` to check the status of the pods in the cluster. |
| 115 | + |
| 116 | +```output |
| 117 | +NAME READY STATUS RESTARTS AGE |
| 118 | +sentimentanalysis-346f22932b484903-exec-1 1/1 Running 0 10m |
| 119 | +sentimentanalysis-346f22932b484903-exec-2 1/1 Running 0 10m |
| 120 | +spark-twitter 1/1 Running 0 12m |
| 121 | +``` |
| 122 | + |
| 123 | +## Twitter sentiment analysis |
| 124 | + |
| 125 | +Create a twitter(X) [developer account](https://developer.x.com/en/docs/x-api/getting-started/getting-access-to-the-x-api) and create a `bearer token`. Using the following script to fetch the tweets |
| 126 | + |
| 127 | +```console |
| 128 | +export BEARER_TOKEN=<BEARER_TOKEN_FROM_X> |
| 129 | +python3 scripts/xapi_tweets.py |
| 130 | +``` |
| 131 | + |
| 132 | +You can modify the script `xapi_tweets.py` with your own keywords. Update the following section in the script to do so |
| 133 | + |
| 134 | +```console |
| 135 | +query_params = {'query': "(#onArm OR @Arm OR #Arm OR #GenAI) -is:retweet lang:en", |
| 136 | + 'tweet.fields': 'lang'} |
| 137 | +``` |
0 commit comments