Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
consumer.py	consumer.py
docker-compose.yaml	docker-compose.yaml
producer.py	producer.py
settings.py	settings.py
spark-submit.sh	spark-submit.sh
streaming-notebook.ipynb	streaming-notebook.ipynb
streaming.py	streaming.py

Name

Last commit message

Last commit date

streaming-notebook.ipynb

streaming.py

Running PySpark Streaming with Redpanda

1. Prerequisite

It is important to create network and volume as described in the document. Therefore please ensure, your volume and network are created correctly.

docker volume ls # should list hadoop-distributed-file-system
docker network ls # should list kafka-spark-network

2. Create Docker Network & Volume

If you have not followed any other examples, and above ls steps shows no output, create them now.

# Create Network
docker network create kafka-spark-network

# Create Volume
docker volume create --name=hadoop-distributed-file-system

Running Producer and Consumer

# Run producer
python producer.py

# Run consumer with default settings
python consumer.py
# Run consumer for specific topic
python consumer.py --topic <topic-name>

Running Streaming Script

spark-submit script ensures installation of necessary jars before running the streaming.py

./spark-submit.sh streaming.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Running PySpark Streaming with Redpanda

1. Prerequisite

2. Create Docker Network & Volume

Running Producer and Consumer

Running Streaming Script

Additional Resources

FilesExpand file tree

redpanda

Directory actions

More options

Directory actions

More options

Latest commit

History

redpanda

Folders and files

parent directory

README.md

Running PySpark Streaming with Redpanda

1. Prerequisite

2. Create Docker Network & Volume

Running Producer and Consumer

Running Streaming Script

Additional Resources