Skip to content

Architecture and deployment

gvdongen edited this page Dec 30, 2020 · 5 revisions

Architecture and deployment on AWS

We deployed the code of this benchmark on AWS in a setup that looks as follows:

architecture

adapted from [1]

The containers can run on any environment which allows Docker containers to be deployed. We used DC/OS.

The AWS services we use are:

  • CloudFormation
  • EC2
  • EBS
  • S3
  • ECR
  • CloudWatch

COMING SOON: Deployment scripts for AWS

Local deployment for development

Local deployment can be used for development purposes but not for generating reliable results since there is limited parallelism and the resources will have limiting effects.

To run locally, we need the following components:

  • Zookeeper
  • Kafka
  • Data stream generator
  • Benchmark job

Zookeeper and Kafka

We run Zookeeper and Kafka in Docker containers. These can be started by using the kafka cluster tools:

cd kafka-cluster-tools
chmod +x setup-kafka.sh
./setup-kafka.sh

To read the output topic:

cd kafka-cluster-tools
chmod +x read-from-topic.sh
./read-from-topic.sh

By default this will log the messages on the metrics topic. To read from another topic, provide this as an argument:

./read-from-topic.sh topic-you-want-to-read

Benchmark Job

We run the benchmark job with SBT. The benchmark job itself requires some environment variables to be set:

MODE=constant-rate;
DEPLOYMENT_TYPE=local;
KAFKA_BOOTSTRAP_SERVERS=$(hostname -I | head -n1 | awk '{print $1\;}'):9092;
ZOOKEEPER_SERVER=$(hostname -I | head -n1 | awk '{print $1\;}'):2181

Data stream generator

We run the data stream generator with SBT to be able to start and stop it more easily. It can be packaged in a Docker container as well, if this is preferred.

References

[1] van Dongen, G., & Van den Poel, D. (2020). Evaluation of Stream Processing Frameworks. IEEE Transactions on Parallel and Distributed Systems, 31(8), 1845-1858.

Clone this wiki locally