-
Notifications
You must be signed in to change notification settings - Fork 15
Architecture and deployment
We deployed the code of this benchmark on AWS in a setup that looks as follows:
adapted from [1]
The containers can run on any environment which allows Docker containers to be deployed. We used DC/OS.
The AWS services we use are:
- CloudFormation
- EC2
- EBS
- S3
- ECR
- CloudWatch
COMING SOON: Deployment scripts for AWS
Local deployment can be used for development purposes but not for generating reliable results since there is limited parallelism and the resources will have limiting effects.
To run locally, we need the following components:
- Zookeeper
- Kafka
- Data stream generator
- Benchmark job
We run Zookeeper and Kafka in Docker containers. These can be started by using the kafka cluster tools:
cd kafka-cluster-tools
chmod +x setup-kafka.sh
./setup-kafka.sh
To read the output topic:
cd kafka-cluster-tools
chmod +x read-from-topic.sh
./read-from-topic.sh
By default this will log the messages on the metrics topic. To read from another topic, provide this as an argument:
./read-from-topic.sh topic-you-want-to-read
We run the benchmark job with SBT. The benchmark job itself requires some environment variables to be set:
MODE=constant-rate;
DEPLOYMENT_TYPE=local;
KAFKA_BOOTSTRAP_SERVERS=$(hostname -I | head -n1 | awk '{print $1\;}'):9092;
ZOOKEEPER_SERVER=$(hostname -I | head -n1 | awk '{print $1\;}'):2181
We run the data stream generator with SBT to be able to start and stop it more easily. It can be packaged in a Docker container as well, if this is preferred.
[1] van Dongen, G., & Van den Poel, D. (2020). Evaluation of Stream Processing Frameworks. IEEE Transactions on Parallel and Distributed Systems, 31(8), 1845-1858.
This work has been made possible by Klarrio