Skip to content

Latest commit

 

History

History
 
 

README.md

Running PySpark Streaming with Redpanda

1. Prerequisite

It is important to create network and volume as described in the document. Therefore please ensure, your volume and network are created correctly.

docker volume ls # should list hadoop-distributed-file-system
docker network ls # should list kafka-spark-network 

2. Create Docker Network & Volume

If you have not followed any other examples, and above ls steps shows no output, create them now.

# Create Network
docker network create kafka-spark-network

# Create Volume
docker volume create --name=hadoop-distributed-file-system

Running Producer and Consumer

# Run producer
python producer.py

# Run consumer with default settings
python consumer.py
# Run consumer for specific topic
python consumer.py --topic <topic-name>

Running Streaming Script

spark-submit script ensures installation of necessary jars before running the streaming.py

./spark-submit.sh streaming.py 

Additional Resources