Skip to content

Commit 2a93dd9

Browse files
Merge pull request #28 from databrickslabs/feature/dlt-meta-uc
Unity Catalog and Databricks Labs CLI Support
2 parents f5f6c34 + b0e2e31 commit 2a93dd9

File tree

122 files changed

+4377
-2251
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

122 files changed

+4377
-2251
lines changed

.coveragerc

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,9 @@ include = src/*.py
55
omit =
66
*/site-packages/*
77
tests/*
8+
src/install.py
9+
src/config.py
10+
src/cli.py
811

912
[report]
1013
exclude_lines =

.gitignore

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -151,4 +151,9 @@ deployment-merged.yaml
151151
.vscode/
152152

153153
# ignore integration test onboarding file.
154-
integration-tests/conf/dlt-meta/onboarding.json
154+
integration-tests/conf/dlt-meta/onboarding.json
155+
156+
.databricks
157+
.databricks-login.json
158+
demo/conf/onboarding.json
159+
integration_tests/conf/onboarding.json

CHANGELOG.md

Lines changed: 3 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,9 @@
11
# Changelog
22

3-
All notable changes to this project will be documented in this file.
43

5-
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6-
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7-
8-
**NOTE:** For CLI interfaces, we support SemVer approach. However, for API components we don't use SemVer as of now. This may lead to instability when using dbx API methods directly.
9-
10-
[Please read through the Keep a Changelog (~5min)](https://keepachangelog.com/en/1.0.0/).
4+
## [v.0.0.5]
5+
- Enabled Unity Catalog support: [PR](https://github.com/databrickslabs/dlt-meta/pull/28)
6+
- Added databricks labs cli: [PR](https://github.com/databrickslabs/dlt-meta/pull/28)
117

128
## [v0.0.4] - 2023-10-09
139
### Added

Makefile

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
clean:
2+
rm -fr build .databricks dlt_meta.egg-info
3+
4+
dev:
5+
python3 -m venv .databricks
6+
.databricks/bin/python -m pip install -e .

README.md

Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,103 @@ With this framework you need to record the source and target metadata in an onbo
6868

6969
## Getting Started
7070
Refer to the [Getting Started](https://databrickslabs.github.io/dlt-meta/getting_started)
71+
### Databricks Labs DLT-META CLI lets you run onboard and deploy in interactive python terminal
72+
#### pre-requisites:
73+
- [Databricks CLI](https://docs.databricks.com/en/dev-tools/cli/tutorial.html)
74+
- Python 3.8.0 +
75+
#### Steps:
76+
- ``` git clone dlt-meta ```
77+
- ``` cd dlt-meta ```
78+
- ``` python -m venv .venv ```
79+
- ```source .venv/bin/activate ```
80+
- ``` pip install databricks-sdk ```
81+
- ```databricks labs dlt-meta onboard```
82+
- - Above command will prompt you to provide onboarding details. If you have cloned dlt-meta git repo then accept defaults which will launch config from demo folder.
7183

84+
``` Provide onboarding file path (default: demo/conf/onboarding.template):
85+
Provide onboarding files local directory (default: demo/):
86+
Provide dbfs path (default: dbfs:/dlt-meta_cli_demo):
87+
Provide databricks runtime version (default: 14.2.x-scala2.12):
88+
Run onboarding with unity catalog enabled?
89+
[0] False
90+
[1] True
91+
Enter a number between 0 and 1: 1
92+
Provide unity catalog name: ravi_dlt_meta_uc
93+
Provide dlt meta schema name (default: dlt_meta_dataflowspecs_203b9da04bdc49f78cdc6c379d1c9ead):
94+
Provide dlt meta bronze layer schema name (default: dltmeta_bronze_cf5956873137432294892fbb2dc34fdb):
95+
Provide dlt meta silver layer schema name (default: dltmeta_silver_5afa2184543342f98f87b30d92b8c76f):
96+
Provide dlt meta layer
97+
[0] bronze
98+
[1] bronze_silver
99+
[2] silver
100+
Enter a number between 0 and 2: 1
101+
Provide bronze dataflow spec table name (default: bronze_dataflowspec):
102+
Provide silver dataflow spec table name (default: silver_dataflowspec):
103+
Overwrite dataflow spec?
104+
[0] False
105+
[1] True
106+
Enter a number between 0 and 1: 1
107+
Provide dataflow spec version (default: v1):
108+
Provide environment name (default: prod): prod
109+
Provide import author name (default: ravi.gawai):
110+
Provide cloud provider name
111+
[0] aws
112+
[1] azure
113+
[2] gcp
114+
Enter a number between 0 and 2: 0
115+
Do you want to update ws paths, catalog, schema details to your onboarding file?
116+
[0] False
117+
[1] True
118+
```
119+
- Goto your databricks workspace and located onboarding job under: Workflow->Jobs runs
120+
- Once onboarding jobs is finished deploy `bronze` and `silver` DLT using below command
121+
- ```databricks labs dlt-meta deploy```
122+
- - Above command will prompt you to provide dlt details. Please provide respective details for schema which you provided in above steps
123+
- - Bronze DLT
124+
```
125+
Deploy DLT-META with unity catalog enabled?
126+
[0] False
127+
[1] True
128+
Enter a number between 0 and 1: 1
129+
Provide unity catalog name: ravi_dlt_meta_uc
130+
Deploy DLT-META with serverless?
131+
[0] False
132+
[1] True
133+
Enter a number between 0 and 1: 1
134+
Provide dlt meta layer
135+
[0] bronze
136+
[1] silver
137+
Enter a number between 0 and 1: 0
138+
Provide dlt meta onboard group: A1
139+
Provide dlt_meta dataflowspec schema name: dlt_meta_dataflowspecs_203b9da04bdc49f78cdc6c379d1c9ead
140+
Provide bronze dataflowspec table name (default: bronze_dataflowspec):
141+
Provide dlt meta pipeline name (default: dlt_meta_bronze_pipeline_2aee3eb837f3439899eef61b76b80d53):
142+
Provide dlt target schema name: dltmeta_bronze_cf5956873137432294892fbb2dc34fdb
143+
```
144+
145+
- Silver DLT
146+
- - ```databricks labs dlt-meta deploy```
147+
- - Above command will prompt you to provide dlt details. Please provide respective details for schema which you provided in above steps
148+
```
149+
Deploy DLT-META with unity catalog enabled?
150+
[0] False
151+
[1] True
152+
Enter a number between 0 and 1: 1
153+
Provide unity catalog name: ravi_dlt_meta_uc
154+
Deploy DLT-META with serverless?
155+
[0] False
156+
[1] True
157+
Enter a number between 0 and 1: 1
158+
Provide dlt meta layer
159+
[0] bronze
160+
[1] silver
161+
Enter a number between 0 and 1: 1
162+
Provide dlt meta onboard group: A1
163+
Provide dlt_meta dataflowspec schema name: dlt_meta_dataflowspecs_203b9da04bdc49f78cdc6c379d1c9ead
164+
Provide silver dataflowspec table name (default: silver_dataflowspec):
165+
Provide dlt meta pipeline name (default: dlt_meta_silver_pipeline_2147545f9b6b4a8a834f62e873fa1364):
166+
Provide dlt target schema name: dltmeta_silver_5afa2184543342f98f87b30d92b8c76f
167+
```
72168
## More questions
73169
Refer to the [FAQ](https://databrickslabs.github.io/dlt-meta/faq)
74170
and DLT-META [documentation](https://databrickslabs.github.io/dlt-meta/)

demo/README.md

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
# [DLT-META](https://github.com/databrickslabs/dlt-meta) DEMO's
2+
1. [DAIS 2023 DEMO](#dais-2023-demo): Showcases DLT-META's capabilities of creating Bronze and Silver DLT pipelines with initial and incremental mode automatically.
3+
2. [Databricks Techsummit Demo](#databricks-tech-summit-fy2024-demo): 100s of data sources ingestion in bronze and silver DLT pipelines automatically.
4+
5+
6+
# DAIS 2023 DEMO
7+
This Demo launches Bronze and Silver DLT pipleines with following activities:
8+
- Customer and Transactions feeds for initial load
9+
- Adds new feeds Product and Stores to existing Bronze and Silver DLT pipelines with metadata changes.
10+
- Runs Bronze and Silver DLT for incremental load for CDC events
11+
12+
### Steps:
13+
1. Launch Terminal/Command promt
14+
15+
2. Install [Databricks CLI](https://docs.databricks.com/dev-tools/cli/index.html)
16+
17+
3. ```git clone https://github.com/databrickslabs/dlt-meta.git ```
18+
19+
4. ```cd dlt-meta```
20+
21+
5. Set python environment variable into terminal
22+
```
23+
export PYTHONPATH=<<local dlt-meta path>>
24+
```
25+
26+
6. Run the command ```python demo/launch_dais_demo.py --username=<<your databricks username>> --source=cloudfiles --uc_catalog_name=<<uc catalog name>> --cloud_provider_name=aws --dbr_version=13.3.x-scala2.12 --dbfs_path=dbfs:/dais-dlt-meta-demo-automated_new```
27+
- cloud_provider_name : aws or azure or gcp
28+
- db_version : Databricks Runtime Version
29+
- dbfs_path : Path on your Databricks workspace where demo will be copied for launching DLT-META Pipelines
30+
- you can provide `--profile=databricks_profile name` in case you already have databricks cli otherwise command prompt will ask host and token.
31+
32+
- - 6a. Databricks Workspace URL:
33+
- - Enter your workspace URL, with the format https://<instance-name>.cloud.databricks.com. To get your workspace URL, see Workspace instance names, URLs, and IDs.
34+
35+
- - 6b. Token:
36+
- In your Databricks workspace, click your Databricks username in the top bar, and then select User Settings from the drop down.
37+
38+
- On the Access tokens tab, click Generate new token.
39+
40+
- (Optional) Enter a comment that helps you to identify this token in the future, and change the token’s default lifetime of 90 days. To create a token with no lifetime (not recommended), leave the Lifetime (days) box empty (blank).
41+
42+
- Click Generate.
43+
44+
- Copy the displayed token
45+
46+
- Paste to command prompt
47+
48+
# Databricks Tech Summit FY2024 DEMO:
49+
This demo will launch auto generated tables(100s) inside single bronze and silver DLT pipeline using dlt-meta.
50+
51+
1. Launch Terminal/Command promt
52+
53+
2. Install [Databricks CLI](https://docs.databricks.com/dev-tools/cli/index.html)
54+
55+
3. ```git clone https://github.com/databrickslabs/dlt-meta.git ```
56+
57+
4. ```cd dlt-meta```
58+
59+
5. Set python environment variable into terminal
60+
```
61+
export PYTHONPATH=<<local dlt-meta path>>
62+
```
63+
64+
6. Run the command ```python demo/launch_techsummit_demo.py [email protected] --source=cloudfiles --cloud_provider_name=aws --dbr_version=13.3.x-scala2.12 --dbfs_path=dbfs:/techsummit-dlt-meta-demo-automated ```
65+
- cloud_provider_name : aws or azure or gcp
66+
- db_version : Databricks Runtime Version
67+
- dbfs_path : Path on your Databricks workspace where demo will be copied for launching DLT-META Pipelines
68+
- you can provide `--profile=databricks_profile name` in case you already have databricks cli otherwise command prompt will ask host and token
69+
70+
- - 6a. Databricks Workspace URL:
71+
- Enter your workspace URL, with the format https://<instance-name>.cloud.databricks.com. To get your workspace URL, see Workspace instance names, URLs, and IDs.
72+
73+
- - 6b. Token:
74+
- In your Databricks workspace, click your Databricks username in the top bar, and then select User Settings from the drop down.
75+
76+
- On the Access tokens tab, click Generate new token.
77+
78+
- (Optional) Enter a comment that helps you to identify this token in the future, and change the token’s default lifetime of 90 days. To create a token with no lifetime (not recommended), leave the Lifetime (days) box empty (blank).
79+
80+
- Click Generate.
81+
82+
- Copy the displayed token
83+
84+
- Paste to command prompt
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.

0 commit comments

Comments
 (0)