Skip to content

Commit 423a776

Browse files
Merge pull request #45 from databrickslabs/feature/issue_41
Feature/issue 41
2 parents 077c71d + 7b2d6b1 commit 423a776

20 files changed

+135
-194
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -115,6 +115,7 @@ celerybeat.pid
115115
# Environments
116116
.env
117117
.venv
118+
.venvclear/
118119
env/
119120
venv/
120121
ENV/

CHANGELOG.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,10 @@
11
# Changelog
22

3+
## [v.0.0.7]
4+
- Added dlt-meta cli documentation and readme with browser support: [PR](https://github.com/databrickslabs/dlt-meta/pull/45)
5+
6+
## [v.0.0.6]
7+
- migrate to create streaming table api from create streaming live table: [PR](https://github.com/databrickslabs/dlt-meta/pull/39)
38

49
## [v.0.0.5]
510
- Enabled Unity Catalog support: [PR](https://github.com/databrickslabs/dlt-meta/pull/28)

README.md

Lines changed: 98 additions & 103 deletions
Original file line numberDiff line numberDiff line change
@@ -2,18 +2,21 @@
22

33
<!-- Top bar will be removed from PyPi packaged versions -->
44
<!-- Dont remove: exclude package -->
5+
56
[Documentation](https://databrickslabs.github.io/dlt-meta/) |
67
[Release Notes](CHANGELOG.md) |
7-
[Examples](https://github.com/databrickslabs/dlt-meta/tree/main/examples)
8+
[Examples](https://github.com/databrickslabs/dlt-meta/tree/main/examples)
9+
810
<!-- Dont remove: end exclude package -->
911

1012
---
13+
1114
<p align="left">
1215
<a href="https://databrickslabs.github.io/dlt-meta/">
1316
<img src="https://img.shields.io/badge/DOCS-PASSING-green?style=for-the-badge" alt="Documentation Status"/>
1417
</a>
1518
<a href="https://pypi.org/project/dlt-meta/">
16-
<img src="https://img.shields.io/badge/PYPI-v%200.0.1-green?style=for-the-badge" alt="Latest Python Release"/>
19+
<img src="https://img.shields.io/badge/PYPI-v%200.0.7-green?style=for-the-badge" alt="Latest Python Release"/>
1720
</a>
1821
<a href="https://github.com/databrickslabs/dlt-meta/actions/workflows/onpush.yml">
1922
<img src="https://img.shields.io/github/workflow/status/databrickslabs/dlt-meta/build/main?style=for-the-badge"
@@ -23,13 +26,6 @@
2326
<img src="https://img.shields.io/codecov/c/github/databrickslabs/dlt-meta?style=for-the-badge&amp;token=2CxLj3YBam"
2427
alt="codecov"/>
2528
</a>
26-
<a href="https://lgtm.com/projects/g/databrickslabs/dlt-meta/alerts">
27-
<img src="https://img.shields.io/lgtm/alerts/github/databricks/dlt-meta?style=for-the-badge" alt="lgtm-alerts"/>
28-
</a>
29-
<a href="https://lgtm.com/projects/g/databrickslabs/dlt-meta/context:python">
30-
<img src="https://img.shields.io/lgtm/grade/python/github/databrickslabs/dbx?style=for-the-badge"
31-
alt="lgtm-code-quality"/>
32-
</a>
3329
<a href="https://pypistats.org/packages/dl-meta">
3430
<img src="https://img.shields.io/pypi/dm/dlt-meta?style=for-the-badge" alt="downloads"/>
3531
</a>
@@ -39,142 +35,141 @@
3935
</a>
4036
</p>
4137

42-
[![lines of code](https://tokei.rs/b1/github/databrickslabs/dlt-meta)]([https://codecov.io/github/databrickslabs/dlt-meta](https://github.com/databrickslabs/dlt-meta))
38+
[![lines of code](https://tokei.rs/b1/github/databrickslabs/dlt-meta)](<[https://codecov.io/github/databrickslabs/dlt-meta](https://github.com/databrickslabs/dlt-meta)>)
4339

4440
---
4541

4642
# Project Overview
47-
```DLT-META``` is a metadata-driven framework based on Databricks [Delta Live Tables](https://www.databricks.com/product/delta-live-tables) (aka DLT) which lets you automate your bronze and silver data pipelines.
4843

49-
With this framework you need to record the source and target metadata in an onboarding json file which acts as the data flow specification aka Dataflowspec. A single generic ```DLT``` pipeline takes the ```Dataflowspec``` and runs your workloads.
44+
`DLT-META` is a metadata-driven framework based on Databricks [Delta Live Tables](https://www.databricks.com/product/delta-live-tables) (aka DLT) which lets you automate your bronze and silver data pipelines.
45+
46+
With this framework you need to record the source and target metadata in an onboarding json file which acts as the data flow specification aka Dataflowspec. A single generic `DLT` pipeline takes the `Dataflowspec` and runs your workloads.
5047

5148
### Components:
5249

53-
#### Metadata Interface
50+
#### Metadata Interface
51+
5452
- Capture input/output metadata in [onboarding file](https://github.com/databrickslabs/dlt-meta/blob/main/examples/onboarding.json)
5553
- Capture [Data Quality Rules](https://github.com/databrickslabs/dlt-meta/tree/main/examples/dqe/customers/bronze_data_quality_expectations.json)
56-
- Capture processing logic as sql in [Silver transformation file](https://github.com/databrickslabs/dlt-meta/blob/main/examples/silver_transformations.json)
54+
- Capture processing logic as sql in [Silver transformation file](https://github.com/databrickslabs/dlt-meta/blob/main/examples/silver_transformations.json)
5755

5856
#### Generic DLT pipeline
57+
5958
- Apply appropriate readers based on input metadata
60-
- Apply data quality rules with DLT expectations
59+
- Apply data quality rules with DLT expectations
6160
- Apply CDC apply changes if specified in metadata
6261
- Builds DLT graph based on input/output metadata
6362
- Launch DLT pipeline
6463

6564
## High-Level Process Flow:
65+
6666
![DLT-META High-Level Process Flow](./docs/static/images/solutions_overview.png)
6767

6868
## Steps
69+
6970
![DLT-META Stages](./docs/static/images/dlt-meta_stages.png)
7071

7172
## Getting Started
73+
7274
Refer to the [Getting Started](https://databrickslabs.github.io/dlt-meta/getting_started)
75+
7376
### Databricks Labs DLT-META CLI lets you run onboard and deploy in interactive python terminal
77+
7478
#### pre-requisites:
75-
- [Databricks CLI](https://docs.databricks.com/en/dev-tools/cli/tutorial.html)
79+
7680
- Python 3.8.0 +
77-
#### Steps:
78-
- ``` git clone dlt-meta ```
79-
- ``` cd dlt-meta ```
80-
- ``` python -m venv .venv ```
81-
- ```source .venv/bin/activate ```
82-
- ``` pip install databricks-sdk ```
83-
- ```databricks labs dlt-meta onboard```
84-
- - Above command will prompt you to provide onboarding details. If you have cloned dlt-meta git repo then accept defaults which will launch config from demo folder.
85-
86-
``` Provide onboarding file path (default: demo/conf/onboarding.template):
87-
Provide onboarding files local directory (default: demo/):
88-
Provide dbfs path (default: dbfs:/dlt-meta_cli_demo):
89-
Provide databricks runtime version (default: 14.2.x-scala2.12):
90-
Run onboarding with unity catalog enabled?
91-
[0] False
92-
[1] True
93-
Enter a number between 0 and 1: 1
94-
Provide unity catalog name: ravi_dlt_meta_uc
95-
Provide dlt meta schema name (default: dlt_meta_dataflowspecs_203b9da04bdc49f78cdc6c379d1c9ead):
96-
Provide dlt meta bronze layer schema name (default: dltmeta_bronze_cf5956873137432294892fbb2dc34fdb):
97-
Provide dlt meta silver layer schema name (default: dltmeta_silver_5afa2184543342f98f87b30d92b8c76f):
98-
Provide dlt meta layer
99-
[0] bronze
100-
[1] bronze_silver
101-
[2] silver
102-
Enter a number between 0 and 2: 1
103-
Provide bronze dataflow spec table name (default: bronze_dataflowspec):
104-
Provide silver dataflow spec table name (default: silver_dataflowspec):
105-
Overwrite dataflow spec?
106-
[0] False
107-
[1] True
108-
Enter a number between 0 and 1: 1
109-
Provide dataflow spec version (default: v1):
110-
Provide environment name (default: prod): prod
111-
Provide import author name (default: ravi.gawai):
112-
Provide cloud provider name
113-
[0] aws
114-
[1] azure
115-
[2] gcp
116-
Enter a number between 0 and 2: 0
117-
Do you want to update ws paths, catalog, schema details to your onboarding file?
118-
[0] False
119-
[1] True
81+
82+
- Databricks CLI v0.213 or later. See [instructions](https://docs.databricks.com/en/dev-tools/cli/tutorial.html)
83+
84+
- Install Databricks CLI on macOS:
85+
- ![macos_install_databricks](docs/static/images/macos_1_databrickslabsmac_installdatabricks.gif)
86+
87+
- Install Databricks CLI on Windows:
88+
- ![windows_install_databricks.png](docs/static/images/windows_install_databricks.png)
89+
90+
Once you install Databricks CLI, authenticate your current machine to a Databricks Workspace:
91+
92+
```commandline
93+
databricks auth login --host WORKSPACE_HOST
94+
```
95+
96+
To enable debug logs, simply add `--debug` flag to any command.
97+
98+
### Installing dlt-meta:
99+
100+
- Install dlt-meta via Databricks CLI:
101+
102+
```commandline
103+
databricks labs install dlt-meta
104+
```
105+
106+
### Onboard using dlt-meta CLI:
107+
108+
If you want to run existing demo files please follow these steps before running onboard command:
109+
110+
```commandline
111+
git clone https://github.com/databrickslabs/dlt-meta.git
120112
```
113+
114+
```commandline
115+
cd dlt-meta
116+
```
117+
118+
```commandline
119+
python -m venv .venv
120+
```
121+
122+
```commandline
123+
source .venv/bin/activate
124+
```
125+
126+
```commandline
127+
pip install databricks-sdk
128+
```
129+
130+
```commandline
131+
databricks labs dlt-meta onboard
132+
```
133+
134+
![onboardingDLTMeta.gif](docs/static/images/onboardingDLTMeta.gif)
135+
136+
Above commands will prompt you to provide onboarding details. If you have cloned dlt-meta git repo then accept defaults which will launch config from demo folder.
137+
![onboardingDLTMeta_2.gif](docs/static/images/onboardingDLTMeta_2.gif)
138+
139+
121140
- Goto your databricks workspace and located onboarding job under: Workflow->Jobs runs
141+
142+
### depoly using dlt-meta CLI:
143+
122144
- Once onboarding jobs is finished deploy `bronze` and `silver` DLT using below command
123-
- ```databricks labs dlt-meta deploy```
145+
- ```commandline
146+
databricks labs dlt-meta deploy
147+
```
124148
- - Above command will prompt you to provide dlt details. Please provide respective details for schema which you provided in above steps
125149
- - Bronze DLT
126-
```
127-
Deploy DLT-META with unity catalog enabled?
128-
[0] False
129-
[1] True
130-
Enter a number between 0 and 1: 1
131-
Provide unity catalog name: ravi_dlt_meta_uc
132-
Deploy DLT-META with serverless?
133-
[0] False
134-
[1] True
135-
Enter a number between 0 and 1: 1
136-
Provide dlt meta layer
137-
[0] bronze
138-
[1] silver
139-
Enter a number between 0 and 1: 0
140-
Provide dlt meta onboard group: A1
141-
Provide dlt_meta dataflowspec schema name: dlt_meta_dataflowspecs_203b9da04bdc49f78cdc6c379d1c9ead
142-
Provide bronze dataflowspec table name (default: bronze_dataflowspec):
143-
Provide dlt meta pipeline name (default: dlt_meta_bronze_pipeline_2aee3eb837f3439899eef61b76b80d53):
144-
Provide dlt target schema name: dltmeta_bronze_cf5956873137432294892fbb2dc34fdb
145-
```
150+
151+
![deployingDLTMeta_bronze.gif](docs/static/images/deployingDLTMeta_bronze.gif)
152+
146153

147154
- Silver DLT
148-
- - ```databricks labs dlt-meta deploy```
155+
- - ```commandline
156+
databricks labs dlt-meta deploy
157+
```
149158
- - Above command will prompt you to provide dlt details. Please provide respective details for schema which you provided in above steps
150-
```
151-
Deploy DLT-META with unity catalog enabled?
152-
[0] False
153-
[1] True
154-
Enter a number between 0 and 1: 1
155-
Provide unity catalog name: ravi_dlt_meta_uc
156-
Deploy DLT-META with serverless?
157-
[0] False
158-
[1] True
159-
Enter a number between 0 and 1: 1
160-
Provide dlt meta layer
161-
[0] bronze
162-
[1] silver
163-
Enter a number between 0 and 1: 1
164-
Provide dlt meta onboard group: A1
165-
Provide dlt_meta dataflowspec schema name: dlt_meta_dataflowspecs_203b9da04bdc49f78cdc6c379d1c9ead
166-
Provide silver dataflowspec table name (default: silver_dataflowspec):
167-
Provide dlt meta pipeline name (default: dlt_meta_silver_pipeline_2147545f9b6b4a8a834f62e873fa1364):
168-
Provide dlt target schema name: dltmeta_silver_5afa2184543342f98f87b30d92b8c76f
169-
```
159+
160+
![deployingDLTMeta_silver.gif](docs/static/images/deployingDLTMeta_silver.gif)
161+
162+
170163
## More questions
164+
171165
Refer to the [FAQ](https://databrickslabs.github.io/dlt-meta/faq)
172166
and DLT-META [documentation](https://databrickslabs.github.io/dlt-meta/)
173167
174168
# Project Support
169+
175170
Please note that all projects released under [`Databricks Labs`](https://www.databricks.com/learn/labs)
176-
are provided for your exploration only, and are not formally supported by Databricks with Service Level Agreements
177-
(SLAs). They are provided AS-IS and we do not make any guarantees of any kind. Please do not submit a support ticket
171+
are provided for your exploration only, and are not formally supported by Databricks with Service Level Agreements
172+
(SLAs). They are provided AS-IS and we do not make any guarantees of any kind. Please do not submit a support ticket
178173
relating to any issues arising from the use of these projects.
179174
180175
Any issues discovered through the use of this project should be filed as issues on the Github Repo.

demo/README.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,8 @@
33
2. [Databricks Techsummit Demo](#databricks-tech-summit-fy2024-demo): 100s of data sources ingestion in bronze and silver DLT pipelines automatically.
44

55

6-
# DAIS 2023 DEMO
6+
# DAIS 2023 DEMO
7+
## [DAIS 2023 Session Recording](https://www.youtube.com/watch?v=WYv5haxLlfA)
78
This Demo launches Bronze and Silver DLT pipleines with following activities:
89
- Customer and Transactions feeds for initial load
910
- Adds new feeds Product and Stores to existing Bronze and Silver DLT pipelines with metadata changes.
@@ -23,7 +24,7 @@ This Demo launches Bronze and Silver DLT pipleines with following activities:
2324
export PYTHONPATH=<<local dlt-meta path>>
2425
```
2526
26-
6. Run the command ```python demo/launch_dais_demo.py --username=<<your databricks username>> --source=cloudfiles --uc_catalog_name=<<uc catalog name>> --cloud_provider_name=aws --dbr_version=13.3.x-scala2.12 --dbfs_path=dbfs:/dais-dlt-meta-demo-automated_new```
27+
6. Run the command ```python demo/launch_dais_demo.py --source=cloudfiles --uc_catalog_name=<<uc catalog name>> --cloud_provider_name=aws --dbr_version=13.3.x-scala2.12 --dbfs_path=dbfs:/dais-dlt-meta-demo-automated_new```
2728
- cloud_provider_name : aws or azure or gcp
2829
- db_version : Databricks Runtime Version
2930
- dbfs_path : Path on your Databricks workspace where demo will be copied for launching DLT-META Pipelines
@@ -61,7 +62,7 @@ This demo will launch auto generated tables(100s) inside single bronze and silve
6162
export PYTHONPATH=<<local dlt-meta path>>
6263
```
6364
64-
6. Run the command ```python demo/launch_techsummit_demo.py --[email protected] --source=cloudfiles --cloud_provider_name=aws --dbr_version=13.3.x-scala2.12 --dbfs_path=dbfs:/techsummit-dlt-meta-demo-automated ```
65+
6. Run the command ```python demo/launch_techsummit_demo.py --source=cloudfiles --cloud_provider_name=aws --dbr_version=13.3.x-scala2.12 --dbfs_path=dbfs:/techsummit-dlt-meta-demo-automated ```
6566
- cloud_provider_name : aws or azure or gcp
6667
- db_version : Databricks Runtime Version
6768
- dbfs_path : Path on your Databricks workspace where demo will be copied for launching DLT-META Pipelines

demo/launch_dais_demo.py

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
import uuid
2+
import webbrowser
23
from databricks.sdk.service import jobs
34
from src.install import WorkspaceInstaller
45
from integration_tests.run_integration_tests import (
@@ -84,8 +85,10 @@ def launch_workflow(self, runner_conf: DLTMetaRunnerConf):
8485
runner_conf.job_id = created_job.job_id
8586
print(f"Job created successfully. job_id={created_job.job_id}, started run...")
8687
print(f"Waiting for job to complete. run_id={created_job.job_id}")
87-
run_by_id = self.ws.jobs.run_now(job_id=created_job.job_id).result()
88-
print(f"Job run finished. run_id={run_by_id}")
88+
run_by_id = self.ws.jobs.run_now(job_id=created_job.job_id)
89+
url = f"{self.ws.config.host}/jobs/{runner_conf.job_id}/runs/{run_by_id}?o={self.ws.get_workspace_id()}/"
90+
webbrowser.open(url)
91+
print(f"Job launched with url={url}")
8992

9093
def create_daisdemo_workflow(self, runner_conf: DLTMetaRunnerConf):
9194
"""

demo/launch_techsummit_demo.py

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@
2323
"""
2424

2525
import uuid
26+
import webbrowser
2627
from databricks.sdk.service import jobs
2728
from databricks.sdk.service.catalog import VolumeType, SchemasAPI
2829
from databricks.sdk.service.workspace import ImportFormat
@@ -163,8 +164,10 @@ def launch_workflow(self, runner_conf: DLTMetaRunnerConf):
163164
runner_conf.job_id = created_job.job_id
164165
print(f"Job created successfully. job_id={created_job.job_id}, started run...")
165166
print(f"Waiting for job to complete. run_id={created_job.job_id}")
166-
run_by_id = self.ws.jobs.run_now(job_id=created_job.job_id).result()
167-
print(f"Job run finished. run_id={run_by_id}")
167+
run_by_id = self.ws.jobs.run_now(job_id=created_job.job_id)
168+
url = f"{self.ws.config.host}/jobs/{runner_conf.job_id}/runs/{run_by_id}?o={self.ws.get_workspace_id()}/"
169+
webbrowser.open(url)
170+
print(f"Job launched with url={url}")
168171

169172
def create_techsummit_demo_workflow(self, runner_conf: TechsummitRunnerConf):
170173
"""

docs/config.toml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,6 @@ baseURL = 'https://databrickslabs.github.io/dlt-meta/'
22
languageCode = 'en-us'
33
title = 'DLT-META'
44
theme= "hugo-theme-learn"
5-
65
pluralizeListTitles = false
76
canonifyURLs = true
87

0 commit comments

Comments
 (0)