Skip to content

Commit a15c517

Browse files
Merge pull request #83 from databrickslabs/82-create-demo-to-showcase-fanout-architecture-in-silver-layer-using-dlt-meta
Added demo to showcase: Silver fanout architecture used cars input dataset containing rows for different countries created 5 silver tables from single cars tables based on filter condition
2 parents 759623e + e085a0a commit a15c517

20 files changed

+30470
-24
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
- Added support for Bring your own custom transformation: [Issue](https://github.com/databrickslabs/dlt-meta/issues/68)
77
- Added support to Unify PyPI releases with GitHub OIDC: [PR](https://github.com/databrickslabs/dlt-meta/pull/62)
88
- Added demo for append_flow and file_metadata options: [PR](https://github.com/databrickslabs/dlt-meta/issues/74)
9+
- Added Demo for silver fanout architecture: [PR](https://github.com/databrickslabs/dlt-meta/pull/83)
910
- Added documentation in docs site for new features: [PR](https://github.com/databrickslabs/dlt-meta/pull/64)
1011
- Added unit tests to showcase silver layer fanout examples: [PR](https://github.com/databrickslabs/dlt-meta/pull/67)
1112
- Fixed issue for No such file or directory: '/demo' :[PR](https://github.com/databrickslabs/dlt-meta/issues/59)

demo/README.md

Lines changed: 55 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
2. [Databricks Techsummit Demo](#databricks-tech-summit-fy2024-demo): 100s of data sources ingestion in bronze and silver DLT pipelines automatically.
44
3. [Append FLOW Autoloader Demo](#append-flow-autoloader-file-metadata-demo): Write to same target from multiple sources using [dlt.append_flow](https://docs.databricks.com/en/delta-live-tables/flows.html#append-flows) and adding [File metadata column](https://docs.databricks.com/en/ingestion/file-metadata-column.html)
55
4. [Append FLOW Eventhub Demo](#append-flow-eventhub-demo): Write to same target from multiple sources using [dlt.append_flow](https://docs.databricks.com/en/delta-live-tables/flows.html#append-flows) and adding [File metadata column](https://docs.databricks.com/en/ingestion/file-metadata-column.html)
6+
5. [Silver Fanout Demo](#silver-fanout-demo): This demo showcases the implementation of fanout architecture in the silver layer.
67

78

89

@@ -35,7 +36,7 @@ This Demo launches Bronze and Silver DLT pipelines with following activities:
3536
export PYTHONPATH=$dlt_meta_home
3637
```
3738
38-
6. Run the command ```python demo/launch_dais_demo.py --source=cloudfiles --uc_catalog_name=<<uc catalog name>> --cloud_provider_name=aws --dbr_version=15.3.x-scala2.12 --dbfs_path=dbfs:/dais-dlt-meta-demo-automated_new```
39+
6. Run the command ```python demo/launch_dais_demo.py --source=cloudfiles --uc_catalog_name=<<uc catalog name>> --cloud_provider_name=aws --dbr_version=15.3.x-scala2.12 --dbfs_path=dbfs:/dais-dlt-meta-demo-automated```
3940
- cloud_provider_name : aws or azure or gcp
4041
- db_version : Databricks Runtime Version
4142
- dbfs_path : Path on your Databricks workspace where demo will be copied for launching DLT-META Pipelines
@@ -202,3 +203,56 @@ This demo will perform following tasks:
202203
```
203204
204205
![af_eh_demo.png](docs/static/images/af_eh_demo.png)
206+
207+
208+
# Silver Fanout Demo
209+
- This demo will showcase the onboarding process for the silver fanout pattern.
210+
- Run the onboarding process for the bronze cars table, which contains data from various countries.
211+
- Run the onboarding process for the silver tables, which have a `where_clause` based on the country condition specified in [silver_transformations_cars.json](https://github.com/databrickslabs/dlt-meta/blob/main/demo/conf/silver_transformations_cars.json).
212+
- Run the Bronze DLT pipeline which will produce cars table.
213+
- Run Silver DLT pipeline, fanning out from the bronze cars table to country-specific tables such as cars_usa, cars_uk, cars_germany, and cars_japan.
214+
215+
### Steps:
216+
1. Launch Terminal/Command prompt
217+
218+
2. Install [Databricks CLI](https://docs.databricks.com/dev-tools/cli/index.html)
219+
220+
3. ```commandline
221+
git clone https://github.com/databrickslabs/dlt-meta.git
222+
```
223+
224+
4. ```commandline
225+
cd dlt-meta
226+
```
227+
5. Set python environment variable into terminal
228+
```commandline
229+
dlt_meta_home=$(pwd)
230+
```
231+
```commandline
232+
export PYTHONPATH=$dlt_meta_home
233+
234+
6. Run the command ```python demo/launch_silver_fanout_demo.py --source=cloudfiles --uc_catalog_name=<<uc catalog name>> --cloud_provider_name=aws --dbr_version=15.3.x-scala2.12 --dbfs_path=dbfs:/dais-dlt-meta-silver-fanout```
235+
- cloud_provider_name : aws or azure
236+
- db_version : Databricks Runtime Version
237+
- dbfs_path : Path on your Databricks workspace where demo will be copied for launching DLT-META Pipelines
238+
- you can provide `--profile=databricks_profile name` in case you already have databricks cli otherwise command prompt will ask host and token.
239+
240+
- - 6a. Databricks Workspace URL:
241+
- - Enter your workspace URL, with the format https://<instance-name>.cloud.databricks.com. To get your workspace URL, see Workspace instance names, URLs, and IDs.
242+
243+
- - 6b. Token:
244+
- In your Databricks workspace, click your Databricks username in the top bar, and then select User Settings from the drop down.
245+
246+
- On the Access tokens tab, click Generate new token.
247+
248+
- (Optional) Enter a comment that helps you to identify this token in the future, and change the token’s default lifetime of 90 days. To create a token with no lifetime (not recommended), leave the Lifetime (days) box empty (blank).
249+
250+
- Click Generate.
251+
252+
- Copy the displayed token
253+
254+
- Paste to command prompt
255+
256+
![silver_fanout_workflow.png](docs/static/images/silver_fanout_workflow.png)
257+
258+
![silver_fanout_dlt.png](docs/static/images/silver_fanout_dlt.png)

demo/conf/onboarding_cars.template

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
[
2+
{
3+
"data_flow_id": "100",
4+
"data_flow_group": "A1",
5+
"source_system": "mysql",
6+
"source_format": "cloudFiles",
7+
"source_details": {
8+
"source_path_demo": "{dbfs_path}/demo/resources/data/cars"
9+
},
10+
"bronze_database_demo": "{uc_catalog_name}.{bronze_schema}",
11+
"bronze_table": "cars",
12+
"bronze_reader_options": {
13+
"cloudFiles.format": "csv",
14+
"cloudFiles.rescuedDataColumn": "_rescued_data",
15+
"header": "true"
16+
},
17+
"silver_database_demo": "{uc_catalog_name}.{silver_schema}",
18+
"silver_table": "cars_usa",
19+
"silver_transformation_json_demo": "{dbfs_path}/demo/conf/silver_transformations_cars.json"
20+
}
21+
]
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
[
2+
{
3+
"data_flow_id": "101",
4+
"data_flow_group": "A1",
5+
"bronze_database_demo": "{uc_catalog_name}.{bronze_schema}",
6+
"bronze_table": "cars",
7+
"silver_database_demo": "{uc_catalog_name}.{silver_schema}",
8+
"silver_table": "cars_germany",
9+
"silver_transformation_json_demo": "{dbfs_path}/demo/conf/silver_transformations_cars.json"
10+
},
11+
{
12+
"data_flow_id": "102",
13+
"data_flow_group": "A1",
14+
"bronze_database_demo": "{uc_catalog_name}.{bronze_schema}",
15+
"bronze_table": "cars",
16+
"silver_database_demo": "{uc_catalog_name}.{silver_schema}",
17+
"silver_table": "cars_uk",
18+
"silver_transformation_json_demo": "{dbfs_path}/demo/conf/silver_transformations_cars.json"
19+
},
20+
{
21+
"data_flow_id": "103",
22+
"data_flow_group": "A1",
23+
"bronze_database_demo": "{uc_catalog_name}.{bronze_schema}",
24+
"bronze_table": "cars",
25+
"silver_database_demo": "{uc_catalog_name}.{silver_schema}",
26+
"silver_table": "cars_japan",
27+
"silver_transformation_json_demo": "{dbfs_path}/demo/conf/silver_transformations_cars.json"
28+
}
29+
]
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
[
2+
{
3+
"target_table": "cars_usa",
4+
"select_exp": [
5+
"concat(first_name,' ',last_name) as full_name",
6+
"country",
7+
"brand",
8+
"model",
9+
"color",
10+
"cc_type"
11+
],
12+
"where_clause": ["country = 'United States'"]
13+
},
14+
{
15+
"target_table": "cars_germany",
16+
"select_exp": [
17+
"concat(first_name,' ',last_name) as full_name",
18+
"country",
19+
"brand",
20+
"model",
21+
"color",
22+
"cc_type"
23+
],
24+
"where_clause": ["country = 'Germany'"]
25+
},
26+
{
27+
"target_table": "cars_uk",
28+
"select_exp": [
29+
"concat(first_name,' ',last_name) as full_name",
30+
"country",
31+
"brand",
32+
"model",
33+
"color",
34+
"cc_type"
35+
],
36+
"where_clause": ["country = 'United Kingdom'"]
37+
},
38+
{
39+
"target_table": "cars_japan",
40+
"select_exp": [
41+
"concat(first_name,' ',last_name) as full_name",
42+
"country",
43+
"brand",
44+
"model",
45+
"color",
46+
"cc_type"
47+
],
48+
"where_clause": ["country = 'Japan'"]
49+
}
50+
]

demo/dbc/afam_eventhub_runners.dbc

196 Bytes
Binary file not shown.

demo/dbc/silver_fout_runners.dbc

1.49 KB
Binary file not shown.

demo/launch_af_cloudfiles_demo.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ def launch_workflow(self, runner_conf: DLTMetaRunnerConf):
8181
"--profile": "provide databricks cli profile name, if not provide databricks_host and token",
8282
"--uc_catalog_name": "provide databricks uc_catalog name, this is required to create volume, schema, table",
8383
"--cloud_provider_name": "provide cloud provider name. Supported values are aws , azure , gcp",
84-
"--dbr_version": "Provide databricks runtime spark version e.g 11.3.x-scala2.12",
84+
"--dbr_version": "Provide databricks runtime spark version e.g 15.3.x-scala2.12",
8585
"--dbfs_path": "Provide databricks workspace dbfs path where you want run integration tests \
8686
e.g --dbfs_path=dbfs:/tmp/DLT-META/"
8787
}

demo/launch_af_eventhub_demo.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ def launch_workflow(self, runner_conf: DLTMetaRunnerConf):
7878
"--profile": "provide databricks cli profile name, if not provide databricks_host and token",
7979
"--uc_catalog_name": "provide databricks uc_catalog name, this is required to create volume, schema, table",
8080
"--cloud_provider_name": "provide cloud provider name. Supported values are aws , azure , gcp",
81-
"--dbr_version": "Provide databricks runtime spark version e.g 11.3.x-scala2.12",
81+
"--dbr_version": "Provide databricks runtime spark version e.g 15.3.x-scala2.12",
8282
"--dbfs_path": "Provide databricks workspace dbfs path where you want run integration tests \
8383
e.g --dbfs_path=dbfs:/tmp/DLT-META/",
8484
"--eventhub_name": "Provide eventhub_name e.g --eventhub_name=iot",

demo/launch_dais_demo.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -181,7 +181,7 @@ def create_daisdemo_workflow(self, runner_conf: DLTMetaRunnerConf):
181181
"--uc_catalog_name": "provide databricks uc_catalog name, \
182182
this is required to create volume, schema, table",
183183
"--cloud_provider_name": "provide cloud provider name. Supported values are aws , azure , gcp",
184-
"--dbr_version": "Provide databricks runtime spark version e.g 11.3.x-scala2.12",
184+
"--dbr_version": "Provide databricks runtime spark version e.g 15.3.x-scala2.12",
185185
"--dbfs_path": "Provide databricks workspace dbfs path where you want run integration tests \
186186
e.g --dbfs_path=dbfs:/tmp/DLT-META/"}
187187

0 commit comments

Comments
 (0)