This project demonstrates how to use DataSQRL to build a streaming pipeline that:
- This example uses Kafka that is running outside DataSQRL docker on host machine
- Reads data from kafka topic and writes to an iceberg table locally
Run the following command from the project root where your package.json and SQRL scripts reside:
docker run -it --rm -p 8888:8888 -p 8081:8081 -v $PWD:/build -v $PWD/data:/data datasqrl/cmd:0.7.0 run -c package.jsonNote
We removed -p 9092:9092 as we are using our own kafka running locally on host machine now
- Go to
data-generatorfolderpython3 load_data.py <jsonl_file> <kafka_broker_address> <topic_name>
- To send Contact data
python3 load_data.py contacts.jsonl contact- Updated records should be generated in
enrichedcontacttopic.