-
Notifications
You must be signed in to change notification settings - Fork 15
Architecture and deployment
We deployed the code of this benchmark on AWS in a setup that looks as follows:
adapted from [1]
The containers can run on any environment which allows Docker containers to be deployed. We used DC/OS.
Local deployment can be used for development purposes but not for generating reliable results since there is limited parallelism and the resources will have limiting effects.
To run locally, we need the following components:
- Zookeeper
- Kafka
- Data stream generator
- Benchmark job
We run Zookeeper and Kafka in Docker containers. These can be started by using the kafka cluster tools:
cd kafka-cluster-tools
chmod +x setup-kafka.sh
./setup-kafka.sh
To read the output topic:
cd kafka-cluster-tools
chmod +x read-from-topic.sh
./read-from-topic.sh
By default this will log the messages on the metrics topic. To read from another topic, provide this as an argument:
./read-from-topic.sh topic-you-want-to-read
We run the benchmark job with SBT. The benchmark job itself requires some environment variables to be set:
MODE=constant-rate;
DEPLOYMENT_TYPE=local;
KAFKA_BOOTSTRAP_SERVERS=$(hostname -I | head -n1 | awk '{print $1\;}'):9092;
ZOOKEEPER_SERVER=$(hostname -I | head -n1 | awk '{print $1\;}'):2181
We run the data stream generator with SBT to be able to start and stop it more easily. It can be packaged in a Docker container as well, if this is preferred.
Copied from deployment/README.md on 9/7/21 This article will run you through the steps to deploy a benchmark run.
A few things need to be arranged in order to be able to deploy the infrastructure.
You will need an AWS account and have an installation of the AWS CLI.
First of all, you need to have access keys to store and retrieve data from S3. This will be used for inputs and outputs throughout the runs.
You will need to have S3 keys that can access the bucket where you want to store your data.
Afterwards, you need to fill in the files deployment/automation_scripts/AWS_ACCESS_KEY.template and deployment/automation_scripts/AWS_SECRET_KEY.template. After filling them in, you need to remove the .template extension and just keep them as two files named AWS_ACCESS_KEY and AWS_SECRET_KEY.
Also you need to create a bucket to put the benchmark jars which you will be running.
Once you have done this you need to put the name of the bucket in a file named automation_scripts/benchmark-jars-bucket.
Then put the URL to the bucket in the file named automation_scripts/benchmark-jars-path, e.g. https://s3.eu-central-1.amazonaws.com/bucketname.
Put the path where the input data of the data stream generator is stored in automation_scripts/benchmark-input-data-path in the format s3a://bucket/rest/of/path.
The data can also be found under data-stream-generator/src/main/resources/data. All the files in this folder need to be available at the path specified in automation_scripts/benchmark-input-data-path.
Put the path where you want to save the metrics that are seved by the output consumer in automation_scripts/benchmark-results-path in the format s3a://bucket/rest/of/path.
Put the path where you want to save the results of the evaluation in automation_scripts/benchmark-results-path in the format s3a://bucket/rest/of/path.
You will need SSH keys to access the underlying EC2 instances of the platform.
You can do this via the EC2 service page by clicking on "Key Pairs" in the menu -> create key pair.
It will automatically download and then you save the private key pem file it in your ~/.ssh folder.
We named it ~/.ssh/id_rsa_benchmark.
You will need to set your AWS profile to run some of the commands. We created a profile called benchmark. On Linux, you do this as follows:
-
create a file named
~/.aws/configand add:[profile benchmark] output = json region = eu-west-1 -
create a file named
~/.aws/credentialsand add:[benchmark] aws_secret_access_key = put-secret-access-key-here aws_access_key_id = put-access-key-id-here
To prevent having to use the --profile statement everywhere, set the profile as the environment variable.
export AWS_PROFILE=benchmark
To check which profile is currently used:
aws configure lists
To spin up the DC/OS cluster, we use AWS CloudFormation.
First of all, make sure you are in the region eu-west-1 (Ireland). Some of the provided scripts depend on this.
Our CloudFormation templates are based on templates provided by DC/OS.
We changed some things such as the instance types, volumes, instance counts, etc.
The reworked templates can be found in the deployment/automation_scripts folder. There are two templates there:
-
new_cloudformation_template.json: uses normal m5n.4xlarge instances. -
new_cloudformation_template_SPOT.json: uses spot m5n.4xlarge instances to reduce cost.
In the template, you can change the number of instances by changing the count in this snippet in the template (almost at the top of the template):
"SlaveInstanceCount": {
"Description": "Required: # of private agents",
"Type": "Number",
"Default": "6"
},
In the new_cloudformation_template_SPOT.json template, you can change the spot price by searching the term SpotPrice in the template and adapting the price. The SpotPrice is set in two locations in the file.
If you replace other things such as the instance type, always do a find replace on this project. Some specifics of instances are used throughout the code.
Then go to CloudFormation in the AWS console and follow these steps:
- Click on launch stack for 1 master for EU central (Frankfurt).
- Choose "Upload a template file" and click next.
- Enter a name for the benchmark stack. We call it
streaming-benchmark. Many scripts use this name so it is best to keep this. - As KeyName, select the benchmark key pair that was made earlier and click next.
- We left the other settings to the defaults. Then click next and launch
It will take around 10-15 minutes for the stack to fully launch.
You can find the the link to the DC/OS UI by clicking in CloudFormation on the streaming-benchmark stack. Then in the 'Outputs' tab, you click on the DnsAddress.
In this section, we explain how to setup the other benchmark services.
All images required to run OSPBench have been published on DockerHub and can be found here.
You can find the following images for the benchmark services:
- gisellevd/ospbench-kafka-broker: for the Kafka brokers
- gisellevd/ospbench-cadvisor: for the cAdvisor instances
- gisellevd/ospbench-metrics-exporter: for the metrics exporter
- gisellevd/ospbench-data-stream-generator: for the data stream generator
For the Spark cluster:
For the Flink cluster:
For Kafka Streams jobs:
- gisellevd/ospbench-kafka-benchmark: for the Kafka Streams job images
The images are publicly available. We recommend pulling them once and storing them in your own repository manager, e.g. ECR. You will then need to change the docker image references in the Marathon files in automation_scripts/aws_marathon_files.
When developing services on DC/OS, you may find it helpful to access your cluster from your local machine via VPN. For instance, you can work from your own development environment and immediately test against your DC/OS cluster.
DC/OS documentation about setting up a tunnel with VPN: https://docs.mesosphere.com/1.8/administration/access-node/tunnel/
In short, you need OpenVPN, the DC/OS CLI, a running DC/OS cluster and your SSH access key.
To login to the cluster with the DCOS CLI, do the following.
-
Install the DC/OS CLI: following these instructions
dcos package install tunnel-cli --cli -
Run the script:
cd automation_scripts ./connect.sh
Then you run the following script:
cd automation_scripts
./initiate-dcos-tunnel.sh
This prompts an external terminal to continue the login process.
In this section, we explain how to deploy the services (e.g. Kafka, cAdvisor, etc.). All services are automatically spun up by:
cd automation_scripts
./bring-up-side-components.sh -p <NUM_PARTITIONS> -s <AMT_SLAVES> -k <KAFKA_BROKER_COUNT>
Parameters:
-p NUM_PARTITIONS: describing the number of partitions for the topics on Kafka. This also influences the number of brokers. (REQUIRED)
-s AMT_SLAVES: describing the number of slaves in the DC/OS cluster. Based on this it spins up enough ECR login and cAdvisor services. (REQUIRED)
-k KAFKA_BROKER_COUNT: describing the number of Kafka brokers. (REQUIRED)
-h Print this Help."
All scripts have a help section by running the script with the flag -h. This counts for most of the scripts in this folder!!!
When you run this script it will prompt a second terminal in which you need to log in to DC/OS. Here, it will setup the DC/OS tunnel. If you had already an open tunnel, than close that one first.
It will give you 2 minutes to login and then it will continue with the script.
When this script has finished running after approx. 15 minutes, you should have all components.
You can now login to the UIs via the following links. You will need to have an open DC/OS tunnel to see these UIs.
- Marathon UI: : /marathon
- Zookeeper: /exhibitor
- HDFS UI: :9002 for the UI and :9001 for the RPC endpoint.
- Kafka manager UI: :9000
- InfluxDB UI: :8083 for the UI. You can find the API endpoint under :8086
- cAdvisor UI: If you click in the DC/OS UI on nodes and then on one of the nodes that had cAdvisor on them, you can click further and see on which ip address and port they are running. If you go to this address you will find the cAdvisor of that node.
- Spark cluster:
- Flink cluster:
You are now all set up!
Now you can start running the benchmark itself.
All the scripts to do this can be found in the automation_scripts folder under the subdirectories of the frameworks.
In each of these directories, you find one script per workload. Run the script with the flag -h to get the help information.
It is possible that you first need to make the script executable before you can run it. You can do this with:
chmod +x name_of_script.sh
cd automation_scripts/flink
ls
If a Flink cluster is running, you can find the UI under :8089.
cd automation_scripts/spark
ls
If a Spark cluster is running you can find the Spark master UI under :7777.
When a job is running you can find that under :4040.
The code of the Spark cluster itself has been included in spark_cluster_3.0.0 and is based on https://github.com/actionml/docker-spark.
cd automation_scripts/kafka-streams
ls
cd automation_scripts/structured-streaming
ls
If a Spark cluster is running you can find the Spark master UI under <host-ip-master>:7777.
When a job is running you can find that under <host-ip-spark-submit>:4040.
The code of the Flink cluster itself has been included in flink_cluster_1.11.1 and is based on what is provided by Ververica.
There is a different folder per size of the taskmanagers because many configs in the flink-conf.yaml changed. These images are also available on DockerHub as such.
[1] van Dongen, G., & Van den Poel, D. (2020). Evaluation of Stream Processing Frameworks. IEEE Transactions on Parallel and Distributed Systems, 31(8), 1845-1858.
This work has been made possible by Klarrio