Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -13,15 +13,15 @@
"snowflake" : {
"catalog-name": "MyCatalog",
"external-volume": "MyNewVolume",
"url": "jdbc:snowflake://${SNOWFLAKE_ID}.snowflakecomputing.com/?user=${SNOWFLAKE_USER}&password=${SNOWFLAKE_PASSWORD}&warehouse=COMPUTE_WH&db=MYSNOWFLAKEDB&schema=public&disableSslHostnameVerification=true"
"url": "${SNOWFLAKE_JDBC_URL}"
}
},
"connectors" : {
"iceberg" : {
"warehouse":"s3://my-iceberg-table-test",
"catalog-impl":"org.apache.iceberg.aws.glue.GlueCatalog",
"io-impl":"org.apache.iceberg.aws.s3.S3FileIO",
"catalog-name": "mydatabase",
"catalog-name": "mycatalog",
"catalog-database": "mydatabase"
}
},
Expand Down
6 changes: 3 additions & 3 deletions getting-started-examples/01_kafka_to_console/README.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
# Kafka-to-Kafka with Avro using DataSQRL
# Kafka to Console with Avro using DataSQRL

This project demonstrates how to use [DataSQRL](https://datasqrl.com) to build a streaming pipeline that:

- Reads data from a kafka topic and prints output to console
- Reads data from a Kafka topic and prints output to console
- Kafka is part of the DataSQRL package.

## 🐳 Running DataSQRL

Run the following command from the project root where your `package.json` and SQRL scripts reside:

```bash
docker run -it --rm -p 8888:8888 -p 9092:9092 -v $PWD:/build datasqrl/cmd:0.7.1 run -c package.json
docker run -it --rm -p 8888:8888 -p 9092:9092 -v $PWD:/build datasqrl/cmd:latest run -c package.json
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say to lock the version, that way we know it needs to be bumped and to run CI to validate the change

```

## Generate Data
Expand Down
6 changes: 3 additions & 3 deletions getting-started-examples/02_kafka_to_kafka/README.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
# Kafka-to-Kafka with Avro using DataSQRL
# Kafka to Kafka with Avro using DataSQRL

This project demonstrates how to use [DataSQRL](https://datasqrl.com) to build a streaming pipeline that:

- Reads data from a kafka topic and writes to another kafka topic
- Reads data from a Kafka topic and writes to another Kafka topic
- Kafka is part of the DataSQRL package.

## 🐳 Running DataSQRL

Run the following command from the project root where your `package.json` and SQRL scripts reside:

```bash
docker run -it --rm -p 8888:8888 -p 9092:9092 -v $PWD:/build datasqrl/cmd:0.7.1 run -c package.json
docker run -it --rm -p 8888:8888 -p 9092:9092 -v $PWD:/build datasqrl/cmd:latest run -c package.json
```

## Generate Data
Expand Down
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Kafka-to-Kafka with Avro using DataSQRL
# Kafka Join using DataSQRL

This project demonstrates how to use [DataSQRL](https://datasqrl.com) to build a streaming pipeline that:

- Reads data from two kafka topics and combines the data from two streams using temporal join
- Reads data from two Kafka topics and combines the data from two streams using temporal join
- writes output to another kafka topic
- Kafka is part of datasqrl package.

Expand All @@ -13,7 +13,7 @@ This project demonstrates how to use [DataSQRL](https://datasqrl.com) to build a
Run the following command from the project root where your `package.json` and SQRL scripts reside:

```bash
docker run -it --rm -p 8888:8888 -p 9092:9092 -v $PWD:/build datasqrl/cmd:0.7.1 run -c package.json
docker run -it --rm -p 8888:8888 -p 9092:9092 -v $PWD:/build datasqrl/cmd:latest run -c package.json
```

## Generate Data
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,13 @@ This project demonstrates how to use [DataSQRL](https://datasqrl.com) to build a
Run the following command from the project root where your `package.json` and SQRL scripts reside:

```bash
docker run -it --rm -p 8888:8888 -p 8081:8081 -v $PWD:/build -v $PWD/data:/data datasqrl/cmd:0.7.1 run -c package.json
docker run -it --rm -p 8888:8888 -p 8081:8081 -v $PWD:/build datasqrl/cmd:latest run -c package.json
```

> [!NOTE]
> Iceberg files will be stored in the `warehouse` directory set by `package.json`

## Output

* There should be iceberg files and folders generated in `$PWD/data/iceberg` directory
* There should be iceberg files and folders generated in `$PWD/warehouse` directory
* Data for the output table will reside in `ProcessedData` (as defined in the sqrl script)
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
},
"connectors": {
"iceberg": {
"warehouse": "/data/iceberg",
"warehouse": "warehouse",
"catalog-type": "hadoop",
"catalog-name": "mycatalog"
}
Expand Down

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

27 changes: 27 additions & 0 deletions getting-started-examples/05_kafka_to_iceberg_local_test/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Kafka to Local Iceberg Warehouse Using DataSQRL

This project demonstrates how to use [DataSQRL](https://datasqrl.com) to build a streaming pipeline that:

- Reads data from a Kafka topic
- Writes data to an Iceberg table locally

## 🐳 Running DataSQRL

Run the following command from the project root where your `package.json` and SQRL scripts reside:
```bash
docker run -it --rm -p 8888:8888 -p 8081:8081 -p 9092:9092 -v $PWD:/build datasqrl/cmd:latest run -c package.json
```

## Generate Data

* Go to `data-generator` folder
* `python3 load_data.py <jsonl_file> <topic_name>`
* To send Contact data
```bash
python3 load_data.py contacts.jsonl contacts
```

## Output

* There should be iceberg files and folders generated in `$PWD/warehouse` directory
* Data for the output table will reside in `MyContacts` (as defined in the sqrl script)
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@ CREATE TABLE Contact (
WATERMARK FOR last_updated AS last_updated - INTERVAL '1' SECOND
) WITH (
'connector' = 'kafka',
'topic' = 'contact',
'properties.bootstrap.servers' = 'host.docker.internal:9092',
'properties.group.id' = 'group1_contacts',
'topic' = 'contacts',
'properties.bootstrap.servers' = '${KAFKA_BOOTSTRAP_SERVERS}',
'properties.group.id' = '${KAFKA_GROUP_ID}',
'scan.startup.mode' = 'earliest-offset',
'format' = 'flexible-json'
);
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
IMPORT kafka-source.Contact AS _Contacts;

MyContacts := SELECT id, firstname, lastname, last_updated FROM _Contacts;
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"version": "1",
"enabled-engines": ["flink", "iceberg"],
"enabled-engines": ["flink", "kafka", "iceberg"],
"script": {
"main": "kafka-to-iceberg.sqrl"
},
Expand All @@ -14,9 +14,12 @@
},
"connectors": {
"iceberg": {
"warehouse": "/data/iceberg",
"warehouse": "warehouse",
"catalog-type": "hadoop",
"catalog-name": "mycatalog"
}
},
"test-runner": {
"create-topics": ["contacts"]
}
}
28 changes: 0 additions & 28 deletions getting-started-examples/06_external_kafka_iceberg_test/README.md

This file was deleted.

This file was deleted.

This file was deleted.

Loading
Loading