Skip to content

Commit 0ffe82e

Browse files
Adding new demo's under doc site
1 parent 387abbb commit 0ffe82e

File tree

9 files changed

+277
-29
lines changed

9 files changed

+277
-29
lines changed

docs/content/demo/Append_FLOW_CF.md

Lines changed: 16 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -21,15 +21,26 @@ This demo will perform following tasks:
2121
databricks auth login --host WORKSPACE_HOST
2222
```
2323
24-
3. ```commandline
24+
3. Install Python package requirements:
25+
```commandline
26+
# Core requirements
27+
pip install "PyYAML>=6.0" setuptools databricks-sdk
28+
29+
# Development requirements
30+
pip install flake8==6.0 delta-spark==3.0.0 pytest>=7.0.0 coverage>=7.0.0 pyspark==3.5.5
31+
```
32+
33+
4. Clone dlt-meta:
34+
```commandline
2535
git clone https://github.com/databrickslabs/dlt-meta.git
2636
```
2737
28-
4. ```commandline
38+
5. Navigate to project directory:
39+
```commandline
2940
cd dlt-meta
3041
```
3142
32-
5. Set python environment variable into terminal
43+
6. Set python environment variable into terminal
3344
```commandline
3445
dlt_meta_home=$(pwd)
3546
```
@@ -38,7 +49,8 @@ This demo will perform following tasks:
3849
export PYTHONPATH=$dlt_meta_home
3950
```
4051
41-
6. ```commandline
52+
7. Run the command:
53+
```commandline
4254
python demo/launch_af_cloudfiles_demo.py --cloud_provider_name=aws --dbr_version=15.3.x-scala2.12 --dbfs_path=dbfs:/tmp/DLT-META/demo/ --uc_catalog_name=dlt_meta_uc
4355
```
4456

docs/content/demo/Append_FLOW_EH.md

Lines changed: 17 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -18,21 +18,32 @@ draft: false
1818
databricks auth login --host WORKSPACE_HOST
1919
```
2020
21-
3. ```commandline
21+
3. Install Python package requirements:
22+
```commandline
23+
# Core requirements
24+
pip install "PyYAML>=6.0" setuptools databricks-sdk
25+
26+
# Development requirements
27+
pip install flake8==6.0 delta-spark==3.0.0 pytest>=7.0.0 coverage>=7.0.0 pyspark==3.5.5
28+
```
29+
30+
4. Clone dlt-meta:
31+
```commandline
2232
git clone https://github.com/databrickslabs/dlt-meta.git
2333
```
2434
25-
4. ```commandline
35+
5. Navigate to project directory:
36+
```commandline
2637
cd dlt-meta
2738
```
28-
5. Set python environment variable into terminal
39+
6. Set python environment variable into terminal
2940
```commandline
3041
dlt_meta_home=$(pwd)
3142
```
3243
```commandline
3344
export PYTHONPATH=$dlt_meta_home
3445
```
35-
6. Eventhub
46+
7. Configure Eventhub
3647
- Needs eventhub instance running
3748
- Need two eventhub topics first for main feed (eventhub_name) and second for append flow feed (eventhub_name_append_flow)
3849
- Create databricks secrets scope for eventhub keys
@@ -61,7 +72,8 @@ draft: false
6172
- eventhub_secrets_scope_name: Databricks secret scope name e.g. eventhubs_dltmeta_creds
6273
- eventhub_port: Eventhub port
6374
64-
7. ```commandline
75+
8. Run the command:
76+
```commandline
6577
python demo/launch_af_eventhub_demo.py --cloud_provider_name=aws --uc_catalog_name=dlt_meta_uc --eventhub_name=dltmeta_demo --eventhub_name_append_flow=dltmeta_demo_af --eventhub_secrets_scope_name=dltmeta_eventhub_creds --eventhub_namespace=dltmeta --eventhub_port=9093 --eventhub_producer_accesskey_name=RootManageSharedAccessKey --eventhub_consumer_accesskey_name=RootManageSharedAccessKey --eventhub_accesskey_secret_name=RootManageSharedAccessKey
6678
```
6779

docs/content/demo/Apply_Changes_From_Snapshot.md

Lines changed: 16 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -26,21 +26,33 @@ draft: false
2626
databricks auth login --host WORKSPACE_HOST
2727
```
2828
29-
3. ```commandline
29+
3. Install Python package requirements:
30+
```commandline
31+
# Core requirements
32+
pip install "PyYAML>=6.0" setuptools databricks-sdk
33+
34+
# Development requirements
35+
pip install flake8==6.0 delta-spark==3.0.0 pytest>=7.0.0 coverage>=7.0.0 pyspark==3.5.5
36+
```
37+
38+
4. Clone dlt-meta:
39+
```commandline
3040
git clone https://github.com/databrickslabs/dlt-meta.git
3141
```
3242
33-
4. ```commandline
43+
5. Navigate to project directory:
44+
```commandline
3445
cd dlt-meta
3546
```
36-
5. Set python environment variable into terminal
47+
6. Set python environment variable into terminal
3748
```commandline
3849
dlt_meta_home=$(pwd)
3950
```
4051
```commandline
4152
export PYTHONPATH=$dlt_meta_home
4253
43-
6. ```commandline
54+
7. Run the command:
55+
```commandline
4456
python demo/launch_acfs_demo.py --uc_catalog_name=<<uc catalog name>>
4557
```
4658
- uc_catalog_name : Unity catalog name

docs/content/demo/DAB.md

Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
---
2+
title: "DAB Demo"
3+
date: 2024-02-26T14:25:26-04:00
4+
weight: 28
5+
draft: false
6+
---
7+
8+
### DAB Demo
9+
10+
## Overview
11+
This demo showcases how to use Databricks Asset Bundles (DABs) with DLT-Meta:
12+
13+
This demo will perform following steps:
14+
- Create dlt-meta schema's for dataflowspec and bronze/silver layer
15+
- Upload necessary resources to unity catalog volume
16+
- Create DAB files with catalog, schema, file locations populated
17+
- Deploy DAB to databricks workspace
18+
- Run onboarding using DAB commands
19+
- Run Bronze/Silver Pipelines using DAB commands
20+
- Demo examples will showcase fan-out pattern in silver layer
21+
- Demo example will show case custom transformations for bronze/silver layers
22+
- Adding custom columns and metadata to Bronze tables
23+
- Implementing SCD Type 1 to Silver tables
24+
- Applying expectations to filter data in Silver tables
25+
26+
### Steps:
27+
1. Launch Command Prompt
28+
29+
2. Install [Databricks CLI](https://docs.databricks.com/dev-tools/cli/index.html)
30+
- Once you install Databricks CLI, authenticate your current machine to a Databricks Workspace:
31+
32+
```commandline
33+
databricks auth login --host WORKSPACE_HOST
34+
```
35+
36+
3. Install Python package requirements:
37+
```commandline
38+
# Core requirements
39+
pip install "PyYAML>=6.0" setuptools databricks-sdk
40+
41+
# Development requirements
42+
pip install flake8==6.0 delta-spark==3.0.0 pytest>=7.0.0 coverage>=7.0.0 pyspark==3.5.5
43+
```
44+
45+
4. Clone dlt-meta:
46+
```commandline
47+
git clone https://github.com/databrickslabs/dlt-meta.git
48+
```
49+
50+
5. Navigate to project directory:
51+
```commandline
52+
cd dlt-meta
53+
```
54+
55+
6. Set python environment variable into terminal:
56+
```commandline
57+
dlt_meta_home=$(pwd)
58+
export PYTHONPATH=$dlt_meta_home
59+
```
60+
61+
7. Generate DAB resources and set up schemas:
62+
This command will:
63+
- Generate DAB configuration files
64+
- Create DLT-Meta schemas
65+
- Upload necessary files to volumes
66+
```commandline
67+
python demo/generate_dabs_resources.py --source=cloudfiles --uc_catalog_name=<your_catalog_name> --profile=<your_profile>
68+
```
69+
> Note: If you don't specify `--profile`, you'll be prompted for your Databricks workspace URL and access token.
70+
71+
8. Deploy and run the DAB bundle:
72+
- Navigate to the DAB directory:
73+
```commandline
74+
cd demo/dabs
75+
```
76+
77+
- Validate the bundle configuration:
78+
```commandline
79+
databricks bundle validate --profile=<your_profile>
80+
```
81+
82+
- Deploy the bundle to dev environment:
83+
```commandline
84+
databricks bundle deploy --target dev --profile=<your_profile>
85+
```
86+
87+
- Run the onboarding job:
88+
```commandline
89+
databricks bundle run onboard_people -t dev --profile=<your_profile>
90+
```
91+
92+
- Execute the pipelines:
93+
```commandline
94+
databricks bundle run execute_pipelines_people -t dev --profile=<your_profile>
95+
```
96+
97+
![dab_onboarding_job.png](/images/dab_onboarding_job.png)
98+
![dab_dlt_pipelines.png](/images/dab_dlt_pipelines.png)

docs/content/demo/DAIS.md

Lines changed: 16 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -23,23 +23,35 @@ This demo showcases DLT-META's capabilities of creating Bronze and Silver DLT pi
2323
databricks auth login --host WORKSPACE_HOST
2424
```
2525
26-
3. ```commandline
26+
3. Install Python package requirements:
27+
```commandline
28+
# Core requirements
29+
pip install "PyYAML>=6.0" setuptools databricks-sdk
30+
31+
# Development requirements
32+
pip install flake8==6.0 delta-spark==3.0.0 pytest>=7.0.0 coverage>=7.0.0 pyspark==3.5.5
33+
```
34+
35+
4. Clone dlt-meta:
36+
```commandline
2737
git clone https://github.com/databrickslabs/dlt-meta.git
2838
```
2939
30-
4. ```commandline
40+
5. Navigate to project directory:
41+
```commandline
3142
cd dlt-meta
3243
```
3344
34-
5. Set python environment variable into terminal
45+
6. Set python environment variable into terminal
3546
```commandline
3647
dlt_meta_home=$(pwd)
3748
```
3849
```commandline
3950
export PYTHONPATH=$dlt_meta_home
4051
```
4152
42-
6. ```commandline
53+
7. Run the command:
54+
```commandline
4355
python demo/launch_dais_demo.py --uc_catalog_name=<<uc catalog name>> --cloud_provider_name=<<>>
4456
```
4557
- uc_catalog_name : unit catalog name

docs/content/demo/DLT_Sink.md

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
---
2+
title: "Lakeflow Declarative Pipelines Sink Demo"
3+
date: 2024-02-26T14:25:26-04:00
4+
weight: 27
5+
draft: false
6+
---
7+
8+
### Lakeflow Declarative Pipelines Sink Demo
9+
This demo will perform following steps:
10+
- Showcase onboarding process for dlt writing to external sink pattern
11+
- Run onboarding for the bronze iot events
12+
- Publish test events to kafka topic
13+
- Run Bronze Lakeflow Declarative Pipelines which will read from kafka source topic and write to:
14+
- Events delta table into UC
15+
- Create quarantine table as per data quality expectations
16+
- Writes to external kafka topics
17+
- Writes to external dbfs location as external delta sink
18+
19+
### Steps:
20+
1. Launch Command Prompt
21+
22+
2. Install [Databricks CLI](https://docs.databricks.com/dev-tools/cli/index.html)
23+
- Once you install Databricks CLI, authenticate your current machine to a Databricks Workspace:
24+
25+
```commandline
26+
databricks auth login --host WORKSPACE_HOST
27+
```
28+
29+
3. Install Python package requirements:
30+
```commandline
31+
# Core requirements
32+
pip install "PyYAML>=6.0" setuptools databricks-sdk
33+
34+
# Development requirements
35+
pip install flake8==6.0 delta-spark==3.0.0 pytest>=7.0.0 coverage>=7.0.0 pyspark==3.5.5
36+
```
37+
38+
4. Clone dlt-meta:
39+
```commandline
40+
git clone https://github.com/databrickslabs/dlt-meta.git
41+
```
42+
43+
5. Navigate to project directory:
44+
```commandline
45+
cd dlt-meta
46+
```
47+
48+
6. Set python environment variable into terminal:
49+
```commandline
50+
dlt_meta_home=$(pwd)
51+
export PYTHONPATH=$dlt_meta_home
52+
```
53+
54+
7. Configure Kafka (Optional):
55+
If you are using secrets for kafka, create databricks secrets scope for source and sink kafka:
56+
```commandline
57+
databricks secrets create-scope <<n>>
58+
```
59+
```commandline
60+
databricks secrets put-secret --json '{
61+
"scope": "<<n>>",
62+
"key": "<<keyname>>",
63+
"string_value": "<<value>>"
64+
}'
65+
```
66+
67+
8. Run the command:
68+
```commandline
69+
python demo/launch_dlt_sink_demo.py --uc_catalog_name=<<uc_catalog_name>> --source=kafka --kafka_source_topic=<<kafka source topic name>> --kafka_sink_topic=<<kafka sink topic name>> --kafka_source_servers_secrets_scope_name=<<kafka source servers secret name>> --kafka_source_servers_secrets_scope_key=<<kafka source server secret scope key name>> --kafka_sink_servers_secret_scope_name=<<kafka sink server secret scope key name>> --kafka_sink_servers_secret_scope_key=<<kafka sink servers secret scope key name>> --profile=<<DEFAULT>>
70+
```
71+
72+
![dlt_demo_sink.png](/images/dlt_demo_sink.png)
73+
![dlt_delta_sink.png](/images/dlt_delta_sink.png)
74+
![dlt_kafka_sink.png](/images/dlt_kafka_sink.png)

docs/content/demo/Silver_Fanout.md

Lines changed: 19 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -23,31 +23,43 @@ draft: false
2323
databricks auth login --host WORKSPACE_HOST
2424
```
2525
26-
3. ```commandline
26+
3. Install Python package requirements:
27+
```commandline
28+
# Core requirements
29+
pip install "PyYAML>=6.0" setuptools databricks-sdk
30+
31+
# Development requirements
32+
pip install flake8==6.0 delta-spark==3.0.0 pytest>=7.0.0 coverage>=7.0.0 pyspark==3.5.5
33+
```
34+
35+
4. Clone dlt-meta:
36+
```commandline
2737
git clone https://github.com/databrickslabs/dlt-meta.git
2838
```
2939
30-
4. ```commandline
40+
5. Navigate to project directory:
41+
```commandline
3142
cd dlt-meta
3243
```
33-
5. Set python environment variable into terminal
44+
6. Set python environment variable into terminal
3445
```commandline
3546
dlt_meta_home=$(pwd)
3647
```
3748
```commandline
3849
export PYTHONPATH=$dlt_meta_home
3950
40-
6. ```commandline
51+
7. Run the command:
52+
```commandline
4153
python demo/launch_silver_fanout_demo.py --uc_catalog_name=<<uc catalog name>> --cloud_provider_name=aws
4254
```
4355
- uc_catalog_name : aws or azure
4456
- cloud_provider_name : aws or azure
4557
- you can provide `--profile=databricks_profile name` in case you already have databricks cli otherwise command prompt will ask host and token.
4658
47-
- - 6a. Databricks Workspace URL:
48-
- - Enter your workspace URL, with the format https://<instance-name>.cloud.databricks.com. To get your workspace URL, see Workspace instance names, URLs, and IDs.
59+
a. Databricks Workspace URL:
60+
Enter your workspace URL, with the format https://<instance-name>.cloud.databricks.com. To get your workspace URL, see Workspace instance names, URLs, and IDs.
4961
50-
- - 6b. Token:
62+
b. Token:
5163
- In your Databricks workspace, click your Databricks username in the top bar, and then select User Settings from the drop down.
5264
5365
- On the Access tokens tab, click Generate new token.

0 commit comments

Comments
 (0)