|
2 | 2 |
|
3 | 3 | <!-- Top bar will be removed from PyPi packaged versions --> |
4 | 4 | <!-- Dont remove: exclude package --> |
| 5 | + |
5 | 6 | [Documentation](https://databrickslabs.github.io/dlt-meta/) | |
6 | 7 | [Release Notes](CHANGELOG.md) | |
7 | | -[Examples](https://github.com/databrickslabs/dlt-meta/tree/main/examples) |
| 8 | +[Examples](https://github.com/databrickslabs/dlt-meta/tree/main/examples) |
| 9 | + |
8 | 10 | <!-- Dont remove: end exclude package --> |
9 | 11 |
|
10 | 12 | --- |
| 13 | + |
11 | 14 | <p align="left"> |
12 | 15 | <a href="https://databrickslabs.github.io/dlt-meta/"> |
13 | 16 | <img src="https://img.shields.io/badge/DOCS-PASSING-green?style=for-the-badge" alt="Documentation Status"/> |
14 | 17 | </a> |
15 | 18 | <a href="https://pypi.org/project/dlt-meta/"> |
16 | | - <img src="https://img.shields.io/badge/PYPI-v%200.0.1-green?style=for-the-badge" alt="Latest Python Release"/> |
| 19 | + <img src="https://img.shields.io/badge/PYPI-v%200.0.7-green?style=for-the-badge" alt="Latest Python Release"/> |
17 | 20 | </a> |
18 | 21 | <a href="https://github.com/databrickslabs/dlt-meta/actions/workflows/onpush.yml"> |
19 | 22 | <img src="https://img.shields.io/github/workflow/status/databrickslabs/dlt-meta/build/main?style=for-the-badge" |
|
23 | 26 | <img src="https://img.shields.io/codecov/c/github/databrickslabs/dlt-meta?style=for-the-badge&token=2CxLj3YBam" |
24 | 27 | alt="codecov"/> |
25 | 28 | </a> |
26 | | - <a href="https://lgtm.com/projects/g/databrickslabs/dlt-meta/alerts"> |
27 | | - <img src="https://img.shields.io/lgtm/alerts/github/databricks/dlt-meta?style=for-the-badge" alt="lgtm-alerts"/> |
28 | | - </a> |
29 | | - <a href="https://lgtm.com/projects/g/databrickslabs/dlt-meta/context:python"> |
30 | | - <img src="https://img.shields.io/lgtm/grade/python/github/databrickslabs/dbx?style=for-the-badge" |
31 | | - alt="lgtm-code-quality"/> |
32 | | - </a> |
33 | 29 | <a href="https://pypistats.org/packages/dl-meta"> |
34 | 30 | <img src="https://img.shields.io/pypi/dm/dlt-meta?style=for-the-badge" alt="downloads"/> |
35 | 31 | </a> |
|
39 | 35 | </a> |
40 | 36 | </p> |
41 | 37 |
|
42 | | -[]([https://codecov.io/github/databrickslabs/dlt-meta](https://github.com/databrickslabs/dlt-meta)) |
| 38 | +[](<[https://codecov.io/github/databrickslabs/dlt-meta](https://github.com/databrickslabs/dlt-meta)>) |
43 | 39 |
|
44 | 40 | --- |
45 | 41 |
|
46 | 42 | # Project Overview |
47 | | -```DLT-META``` is a metadata-driven framework based on Databricks [Delta Live Tables](https://www.databricks.com/product/delta-live-tables) (aka DLT) which lets you automate your bronze and silver data pipelines. |
48 | 43 |
|
49 | | -With this framework you need to record the source and target metadata in an onboarding json file which acts as the data flow specification aka Dataflowspec. A single generic ```DLT``` pipeline takes the ```Dataflowspec``` and runs your workloads. |
| 44 | +`DLT-META` is a metadata-driven framework based on Databricks [Delta Live Tables](https://www.databricks.com/product/delta-live-tables) (aka DLT) which lets you automate your bronze and silver data pipelines. |
| 45 | + |
| 46 | +With this framework you need to record the source and target metadata in an onboarding json file which acts as the data flow specification aka Dataflowspec. A single generic `DLT` pipeline takes the `Dataflowspec` and runs your workloads. |
50 | 47 |
|
51 | 48 | ### Components: |
52 | 49 |
|
53 | | -#### Metadata Interface |
| 50 | +#### Metadata Interface |
| 51 | + |
54 | 52 | - Capture input/output metadata in [onboarding file](https://github.com/databrickslabs/dlt-meta/blob/main/examples/onboarding.json) |
55 | 53 | - Capture [Data Quality Rules](https://github.com/databrickslabs/dlt-meta/tree/main/examples/dqe/customers/bronze_data_quality_expectations.json) |
56 | | -- Capture processing logic as sql in [Silver transformation file](https://github.com/databrickslabs/dlt-meta/blob/main/examples/silver_transformations.json) |
| 54 | +- Capture processing logic as sql in [Silver transformation file](https://github.com/databrickslabs/dlt-meta/blob/main/examples/silver_transformations.json) |
57 | 55 |
|
58 | 56 | #### Generic DLT pipeline |
| 57 | + |
59 | 58 | - Apply appropriate readers based on input metadata |
60 | | -- Apply data quality rules with DLT expectations |
| 59 | +- Apply data quality rules with DLT expectations |
61 | 60 | - Apply CDC apply changes if specified in metadata |
62 | 61 | - Builds DLT graph based on input/output metadata |
63 | 62 | - Launch DLT pipeline |
64 | 63 |
|
65 | 64 | ## High-Level Process Flow: |
| 65 | + |
66 | 66 |  |
67 | 67 |
|
68 | 68 | ## Steps |
| 69 | + |
69 | 70 |  |
70 | 71 |
|
71 | 72 | ## Getting Started |
| 73 | + |
72 | 74 | Refer to the [Getting Started](https://databrickslabs.github.io/dlt-meta/getting_started) |
| 75 | + |
73 | 76 | ### Databricks Labs DLT-META CLI lets you run onboard and deploy in interactive python terminal |
| 77 | + |
74 | 78 | #### pre-requisites: |
75 | | -- [Databricks CLI](https://docs.databricks.com/en/dev-tools/cli/tutorial.html) |
| 79 | + |
76 | 80 | - Python 3.8.0 + |
77 | | -#### Steps: |
78 | | -- ``` git clone dlt-meta ``` |
79 | | -- ``` cd dlt-meta ``` |
80 | | -- ``` python -m venv .venv ``` |
81 | | -- ```source .venv/bin/activate ``` |
82 | | -- ``` pip install databricks-sdk ``` |
83 | | -- ```databricks labs dlt-meta onboard``` |
84 | | -- - Above command will prompt you to provide onboarding details. If you have cloned dlt-meta git repo then accept defaults which will launch config from demo folder. |
85 | | - |
86 | | -``` Provide onboarding file path (default: demo/conf/onboarding.template): |
87 | | - Provide onboarding files local directory (default: demo/): |
88 | | - Provide dbfs path (default: dbfs:/dlt-meta_cli_demo): |
89 | | - Provide databricks runtime version (default: 14.2.x-scala2.12): |
90 | | - Run onboarding with unity catalog enabled? |
91 | | - [0] False |
92 | | - [1] True |
93 | | - Enter a number between 0 and 1: 1 |
94 | | - Provide unity catalog name: ravi_dlt_meta_uc |
95 | | - Provide dlt meta schema name (default: dlt_meta_dataflowspecs_203b9da04bdc49f78cdc6c379d1c9ead): |
96 | | - Provide dlt meta bronze layer schema name (default: dltmeta_bronze_cf5956873137432294892fbb2dc34fdb): |
97 | | - Provide dlt meta silver layer schema name (default: dltmeta_silver_5afa2184543342f98f87b30d92b8c76f): |
98 | | - Provide dlt meta layer |
99 | | - [0] bronze |
100 | | - [1] bronze_silver |
101 | | - [2] silver |
102 | | - Enter a number between 0 and 2: 1 |
103 | | - Provide bronze dataflow spec table name (default: bronze_dataflowspec): |
104 | | - Provide silver dataflow spec table name (default: silver_dataflowspec): |
105 | | - Overwrite dataflow spec? |
106 | | - [0] False |
107 | | - [1] True |
108 | | - Enter a number between 0 and 1: 1 |
109 | | - Provide dataflow spec version (default: v1): |
110 | | - Provide environment name (default: prod): prod |
111 | | - Provide import author name (default: ravi.gawai): |
112 | | - Provide cloud provider name |
113 | | - [0] aws |
114 | | - [1] azure |
115 | | - [2] gcp |
116 | | - Enter a number between 0 and 2: 0 |
117 | | - Do you want to update ws paths, catalog, schema details to your onboarding file? |
118 | | - [0] False |
119 | | - [1] True |
| 81 | + |
| 82 | +- Databricks CLI v0.213 or later. See [instructions](https://docs.databricks.com/en/dev-tools/cli/tutorial.html) |
| 83 | + |
| 84 | +- Install Databricks CLI on macOS: |
| 85 | +-  |
| 86 | + |
| 87 | +- Install Databricks CLI on Windows: |
| 88 | +-  |
| 89 | + |
| 90 | +Once you install Databricks CLI, authenticate your current machine to a Databricks Workspace: |
| 91 | + |
| 92 | +```commandline |
| 93 | +databricks auth login --host WORKSPACE_HOST |
| 94 | +``` |
| 95 | + |
| 96 | + To enable debug logs, simply add `--debug` flag to any command. |
| 97 | + |
| 98 | +### Installing dlt-meta: |
| 99 | + |
| 100 | +- Install dlt-meta via Databricks CLI: |
| 101 | + |
| 102 | +```commandline |
| 103 | + databricks labs install dlt-meta |
| 104 | +``` |
| 105 | + |
| 106 | +### Onboard using dlt-meta CLI: |
| 107 | + |
| 108 | +If you want to run existing demo files please follow these steps before running onboard command: |
| 109 | + |
| 110 | +```commandline |
| 111 | + git clone https://github.com/databrickslabs/dlt-meta.git |
120 | 112 | ``` |
| 113 | + |
| 114 | +```commandline |
| 115 | + cd dlt-meta |
| 116 | +``` |
| 117 | + |
| 118 | +```commandline |
| 119 | + python -m venv .venv |
| 120 | +``` |
| 121 | + |
| 122 | +```commandline |
| 123 | + source .venv/bin/activate |
| 124 | +``` |
| 125 | + |
| 126 | +```commandline |
| 127 | + pip install databricks-sdk |
| 128 | +``` |
| 129 | + |
| 130 | +```commandline |
| 131 | + databricks labs dlt-meta onboard |
| 132 | +``` |
| 133 | + |
| 134 | + |
| 135 | + |
| 136 | +Above commands will prompt you to provide onboarding details. If you have cloned dlt-meta git repo then accept defaults which will launch config from demo folder. |
| 137 | + |
| 138 | + |
| 139 | + |
121 | 140 | - Goto your databricks workspace and located onboarding job under: Workflow->Jobs runs |
| 141 | + |
| 142 | +### depoly using dlt-meta CLI: |
| 143 | + |
122 | 144 | - Once onboarding jobs is finished deploy `bronze` and `silver` DLT using below command |
123 | | -- ```databricks labs dlt-meta deploy``` |
| 145 | +- ```commandline |
| 146 | + databricks labs dlt-meta deploy |
| 147 | + ``` |
124 | 148 | - - Above command will prompt you to provide dlt details. Please provide respective details for schema which you provided in above steps |
125 | 149 | - - Bronze DLT |
126 | | -``` |
127 | | - Deploy DLT-META with unity catalog enabled? |
128 | | - [0] False |
129 | | - [1] True |
130 | | - Enter a number between 0 and 1: 1 |
131 | | - Provide unity catalog name: ravi_dlt_meta_uc |
132 | | - Deploy DLT-META with serverless? |
133 | | - [0] False |
134 | | - [1] True |
135 | | - Enter a number between 0 and 1: 1 |
136 | | - Provide dlt meta layer |
137 | | - [0] bronze |
138 | | - [1] silver |
139 | | - Enter a number between 0 and 1: 0 |
140 | | - Provide dlt meta onboard group: A1 |
141 | | - Provide dlt_meta dataflowspec schema name: dlt_meta_dataflowspecs_203b9da04bdc49f78cdc6c379d1c9ead |
142 | | - Provide bronze dataflowspec table name (default: bronze_dataflowspec): |
143 | | - Provide dlt meta pipeline name (default: dlt_meta_bronze_pipeline_2aee3eb837f3439899eef61b76b80d53): |
144 | | - Provide dlt target schema name: dltmeta_bronze_cf5956873137432294892fbb2dc34fdb |
145 | | -``` |
| 150 | + |
| 151 | + |
| 152 | + |
146 | 153 |
|
147 | 154 | - Silver DLT |
148 | | -- - ```databricks labs dlt-meta deploy``` |
| 155 | +- - ```commandline |
| 156 | + databricks labs dlt-meta deploy |
| 157 | + ``` |
149 | 158 | - - Above command will prompt you to provide dlt details. Please provide respective details for schema which you provided in above steps |
150 | | -``` |
151 | | - Deploy DLT-META with unity catalog enabled? |
152 | | - [0] False |
153 | | - [1] True |
154 | | - Enter a number between 0 and 1: 1 |
155 | | - Provide unity catalog name: ravi_dlt_meta_uc |
156 | | - Deploy DLT-META with serverless? |
157 | | - [0] False |
158 | | - [1] True |
159 | | - Enter a number between 0 and 1: 1 |
160 | | - Provide dlt meta layer |
161 | | - [0] bronze |
162 | | - [1] silver |
163 | | - Enter a number between 0 and 1: 1 |
164 | | - Provide dlt meta onboard group: A1 |
165 | | - Provide dlt_meta dataflowspec schema name: dlt_meta_dataflowspecs_203b9da04bdc49f78cdc6c379d1c9ead |
166 | | - Provide silver dataflowspec table name (default: silver_dataflowspec): |
167 | | - Provide dlt meta pipeline name (default: dlt_meta_silver_pipeline_2147545f9b6b4a8a834f62e873fa1364): |
168 | | - Provide dlt target schema name: dltmeta_silver_5afa2184543342f98f87b30d92b8c76f |
169 | | -``` |
| 159 | +
|
| 160 | + |
| 161 | +
|
| 162 | +
|
170 | 163 | ## More questions |
| 164 | +
|
171 | 165 | Refer to the [FAQ](https://databrickslabs.github.io/dlt-meta/faq) |
172 | 166 | and DLT-META [documentation](https://databrickslabs.github.io/dlt-meta/) |
173 | 167 |
|
174 | 168 | # Project Support |
| 169 | +
|
175 | 170 | Please note that all projects released under [`Databricks Labs`](https://www.databricks.com/learn/labs) |
176 | | - are provided for your exploration only, and are not formally supported by Databricks with Service Level Agreements |
177 | | -(SLAs). They are provided AS-IS and we do not make any guarantees of any kind. Please do not submit a support ticket |
| 171 | +are provided for your exploration only, and are not formally supported by Databricks with Service Level Agreements |
| 172 | +(SLAs). They are provided AS-IS and we do not make any guarantees of any kind. Please do not submit a support ticket |
178 | 173 | relating to any issues arising from the use of these projects. |
179 | 174 |
|
180 | 175 | Any issues discovered through the use of this project should be filed as issues on the Github Repo. |
|
0 commit comments