Skip to content

Latest commit

 

History

History
28 lines (20 loc) · 947 Bytes

File metadata and controls

28 lines (20 loc) · 947 Bytes

Kafka-to-Kafka with Avro using DataSQRL

This project demonstrates how to use DataSQRL to build a streaming pipeline that:

  • This example uses Kafka that is running outside DataSQRL docker on host machine
  • Reads data from kafka topic and writes to an iceberg table locally

🐳 Running DataSQRL

Run the following command from the project root where your package.json and SQRL scripts reside:

docker run -it --rm -p 8888:8888 -p 8081:8081 -v $PWD:/build -v $PWD/data:/data datasqrl/cmd:0.7.0 run -c package.json

Note

We removed -p 9092:9092 as we are using our own kafka running locally on host machine now

Generate Data

  • Go to data-generator folder
    • python3 load_data.py <jsonl_file> <kafka_broker_address> <topic_name>
  • To send Contact data
python3 load_data.py contacts.jsonl contact

Output

  • Updated records should be generated in enrichedcontact topic.