You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+10-4Lines changed: 10 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -40,10 +40,9 @@
40
40
---
41
41
42
42
# Project Overview
43
+
`DLT-META` is a metadata-driven framework designed to work with [Delta Live Tables](https://www.databricks.com/product/delta-live-tables). This framework enables the automation of bronze and silver data pipelines by leveraging metadata recorded in an onboarding JSON file. This file, known as the Dataflowspec, serves as the data flow specification, detailing the source and target metadata required for the pipelines.
43
44
44
-
`DLT-META` is a metadata-driven framework based on Databricks [Delta Live Tables](https://www.databricks.com/product/delta-live-tables) (aka DLT) which lets you automate your bronze and silver data pipelines.
45
-
46
-
With this framework you need to record the source and target metadata in an onboarding json file which acts as the data flow specification aka Dataflowspec. A single generic `DLT` pipeline takes the `Dataflowspec` and runs your workloads.
45
+
In practice, a single generic DLT pipeline reads the Dataflowspec and uses it to orchestrate and run the necessary data processing workloads. This approach streamlines the development and management of data pipelines, allowing for a more efficient and scalable data processing workflow
47
46
48
47
### Components:
49
48
@@ -128,11 +127,18 @@ If you want to run existing demo files please follow these steps before running
Above commands will prompt you to provide onboarding details. If you have cloned dlt-meta git repo then accept defaults which will launch config from demo folder.
1.[DAIS 2023 DEMO](#dais-2023-demo): Showcases DLT-META's capabilities of creating Bronze and Silver DLT pipelines with initial and incremental mode automatically.
3
3
2.[Databricks Techsummit Demo](#databricks-tech-summit-fy2024-demo): 100s of data sources ingestion in bronze and silver DLT pipelines automatically.
4
+
3.[Append FLOW Autoloader Demo](#append-flow-autoloader-file-metadata-demo): Write to same target from multiple sources using [dlt.append_flow](https://docs.databricks.com/en/delta-live-tables/flows.html#append-flows) and adding [File metadata column](https://docs.databricks.com/en/ingestion/file-metadata-column.html)
5
+
4.[Append FLOW Eventhub Demo](#append-flow-eventhub-demo): Write to same target from multiple sources using [dlt.append_flow](https://docs.databricks.com/en/delta-live-tables/flows.html#append-flows) and adding [File metadata column](https://docs.databricks.com/en/ingestion/file-metadata-column.html)
6
+
5.[Silver Fanout Demo](#silver-fanout-demo): This demo showcases the implementation of fanout architecture in the silver layer.
- dbfs_path : Path on your Databricks workspace where demo will be copied for launching DLT-META Pipelines
69
92
- you can provide `--profile=databricks_profile name` in case you already have databricks cli otherwise command prompt will ask host and token
@@ -82,4 +105,154 @@ This demo will launch auto generated tables(100s) inside single bronze and silve
82
105
83
106
- Copy the displayed token
84
107
85
-
- Paste to command prompt
108
+
- Paste to command prompt
109
+
110
+
111
+
# Append Flow Autoloader file metadata demo:
112
+
This demo will perform following tasks:
113
+
- Read from different source paths using autoloader and write to same target using append_flow API
114
+
- Read from different delta tables and write to same silver table using append_flow API
115
+
- Add file_name and file_path to target bronze table for autoloader source using [File metadata column](https://docs.databricks.com/en/ingestion/file-metadata-column.html)
- This demo will showcase the onboarding process for the silver fanout pattern.
210
+
- Run the onboarding process for the bronze cars table, which contains data from various countries.
211
+
- Run the onboarding process for the silver tables, which have a `where_clause` based on the country condition specified in [silver_transformations_cars.json](https://github.com/databrickslabs/dlt-meta/blob/main/demo/conf/silver_transformations_cars.json).
212
+
- Run the Bronze DLT pipeline which will produce cars table.
213
+
- Run Silver DLT pipeline, fanning out from the bronze cars table to country-specific tables such as cars_usa, cars_uk, cars_germany, and cars_japan.
6. Run the command ```python demo/launch_silver_fanout_demo.py --source=cloudfiles --uc_catalog_name=<<uc catalog name>> --cloud_provider_name=aws --dbr_version=15.3.x-scala2.12 --dbfs_path=dbfs:/dais-dlt-meta-silver-fanout```
235
+
- cloud_provider_name : aws or azure
236
+
- db_version : Databricks Runtime Version
237
+
- dbfs_path : Path on your Databricks workspace where demo will be copied for launching DLT-META Pipelines
238
+
- you can provide `--profile=databricks_profile name` in case you already have databricks cli otherwise command prompt will ask host and token.
239
+
240
+
- - 6a. Databricks Workspace URL:
241
+
- - Enter your workspace URL, with the format https://<instance-name>.cloud.databricks.com. To get your workspace URL, see Workspace instance names, URLs, and IDs.
242
+
243
+
- - 6b. Token:
244
+
- In your Databricks workspace, click your Databricks username in the top bar, and then select User Settings from the drop down.
245
+
246
+
- On the Access tokens tab, click Generate new token.
247
+
248
+
- (Optional) Enter a comment that helps you to identify this token in the future, and change the token’s default lifetime of 90 days. To create a token with no lifetime (not recommended), leave the Lifetime (days) box empty (blank).
0 commit comments