Simple app to test out spark streaming from Kafka.
It's assumed that both docker and docker-compose are already installed on your machine to run this poc. Java, python3, Spark, and kafkacat (optional but recommended) will also be used. Anything that needs to be installed is most likely going to be easiest when using Homebrew (such as kafkacat)
Jake Mason: Creating the model code.
wurstmeister: For his Kafka Docker setup at his repo.
Kafka docker image
Run Kafka using docker
Kafka 0.10.0 example producer
Kafkacat git repo
Kafkacat confluence
Spark streaming + Kafka integration guide
Kafka-python
After cloning this repo clone the repo below to get some Kafka docker-compose files:
cd simple-pyspark-streaming-example;
git clone https://github.com/wurstmeister/kafka-docker.gitIn the file kafka-docker/docker-compose-single-broker.yml change the KAFKA_ADVERTISED_HOST_NAME environment variable to use localhost.
Start a single node cluster with broker at localhost:9092.
docker-compose -f kafka-docker/docker-compose-single-broker.yml up -dTo verify the cluster was created successfully you can use a program like kafkacat to consume and produce to a topic.
In a new terminal use kafkacat to connect a consumer to the broker with topic test.
kafkacat -b localhost:9092 -C -t testAdd -d broker for debugging:
kafkacat -d broker -b localhost:9092 -C -t testIn another new terminal use kafkacat to connect a producer to the broker with topic test.
kafkacat -b localhost:9092 -P -t testType a message into the terminal and press enter to see the message consumed by the kafkacat consumer client.
TODO