Skip to content

Commit c44e1ae

Browse files
Merge pull request #215 from databrickslabs/issue_211
Imporvements to lakehouse app #211
2 parents 387abbb + 51e7636 commit c44e1ae

File tree

10 files changed

+248
-54
lines changed

10 files changed

+248
-54
lines changed

docs/content/app/_index.md

Lines changed: 0 additions & 38 deletions
This file was deleted.

docs/content/faq/app_faq.md

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
---
2+
title: "App"
3+
date: 2025-08-29T14:50:11-04:00
4+
weight: 63
5+
draft: false
6+
---
7+
8+
### Initial Setup
9+
10+
**Q1. Do I need to run an initial setup before using the DLT-META App?**
11+
Yes. Before using the DLT-META App, you must click the Setup button to create the required dlt-meta environment. This initializes the app and enables you to onboard or manage Lakeflow Declarative Pipelines.
12+
13+
### Features and Capabilities
14+
15+
**Q2. What are the main features of the DLT-META App?**
16+
The DLT-META App provides several key capabilities:
17+
- Onboard new Lakeflow Declarative Pipeline through an interactive interface
18+
- Deploy and manage pipelines directly in the app
19+
- Run demo flows to explore example pipelines and usage patterns
20+
- Use the command-line interface (CLI) to automate operations
21+
22+
### Access and Permissions
23+
24+
**Q3. Who can access and use the DLT-META App?**
25+
Only authenticated Databricks workspace users with appropriate permissions can access and use the app:
26+
- You need `CAN_USE` permission to run the app
27+
- You need `CAN_MANAGE` permission to administer it
28+
- The app can be shared within your workspace or account
29+
- Every user must log in with their Databricks account credentials
30+
31+
### Resource Access
32+
33+
**Q4. How does catalog and schema access work in the DLT-META App?**
34+
By default, the app uses a dedicated Service Principal (SP) for all data and resource access:
35+
- The Service Principal needs explicit permissions (`USE CATALOG`, `USE SCHEMA`, `SELECT`) on all Unity Catalog resources
36+
- User abilities depend on the Service Principal's access, regardless of URL
37+
- Optional On-Behalf-Of (OBO) mode uses individual user permissions
38+
39+
### Troubleshooting
40+
41+
**Q5. How should I resolve access errors or permission issues?**
42+
If you experience access-related errors:
43+
- Verify Service Principal permissions in Unity Catalog
44+
- Check app attachments to warehouses and secrets
45+
- Review recent administrative changes
46+
- Check audit logs for permission denials
47+
- Contact your Databricks workspace administrator if needed
48+
49+
### Security and Isolation
50+
51+
**Q6. How is security and isolation managed?**
52+
The DLT-META App provides enterprise-grade security:
53+
- Runs in a multi-tenant platform with strong isolation
54+
- Uses dedicated, isolated serverless compute
55+
- Restricts sharing to authenticated users only
56+
- Logs all sharing and permission events
57+
- No public or anonymous access allowed
58+
59+
### Best Practices
60+
61+
**Q7. What are the security best practices?**
62+
Follow these guidelines for secure operation:
63+
- Use minimum required permissions (principle of least privilege)
64+
- Monitor audit logs regularly
65+
- Restrict to trusted application code
66+
- Enable OBO mode carefully
67+
- Maintain regular security reviews
Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
---
2+
title: "DLT-META Lakehouse App"
3+
date: 2021-08-04T14:25:26-04:00
4+
weight: 9
5+
draft: false
6+
---
7+
8+
# DLT-META Lakehouse App Setup
9+
10+
## Prerequisites
11+
12+
### System Requirements
13+
- Python 3.8.0 or higher
14+
- [Databricks CLI](https://docs.databricks.com/en/dev-tools/cli/tutorial.html) (latest version, e.g., 0.244.0)
15+
- Configured workspace access
16+
17+
### Initial Setup
18+
1. Authenticate with Databricks:
19+
```commandline
20+
databricks auth login --host WORKSPACE_HOST
21+
```
22+
23+
2. Setup Python Environment:
24+
```commandline
25+
git clone https://github.com/databrickslabs/dlt-meta.git
26+
cd dlt-meta
27+
python -m venv .venv
28+
source .venv/bin/activate
29+
pip install databricks-sdk
30+
```
31+
32+
## Deployment Options
33+
34+
### Deploy to Databricks
35+
36+
1. Create Custom App:
37+
```commandline
38+
databricks apps create demo-dltmeta
39+
```
40+
> Note: Wait for command completion (a few minutes)
41+
42+
2. Setup App Code:
43+
```commandline
44+
cd dlt-meta/lakehouse_app
45+
46+
# Replace testapp with your preferred folder name
47+
databricks sync . /Workspace/Users/<user1.user2>@databricks.com/testapp
48+
49+
# Deploy the app
50+
databricks apps deploy demo-dltmeta --source-code-path /Workspace/Users/<user1.user2>@databricks.com/testapp
51+
```
52+
53+
3. Access the App:
54+
- Open URL from step 1 log, or
55+
- Navigate: Databricks Web UI → New → App → Back to App → Search your app name
56+
57+
### Run Locally
58+
59+
1. Setup Environment:
60+
```commandline
61+
cd dlt-meta/lakehouse_app
62+
pip install -r requirements.txt
63+
```
64+
65+
2. Configure Databricks:
66+
```commandline
67+
databricks configure --host <your databricks host url> --token <your token>
68+
```
69+
70+
3. Start App:
71+
```commandline
72+
python App.py
73+
```
74+
Access at: http://127.0.0.1:5000
75+
76+
## Using DLT-META App
77+
78+
### App User Setup
79+
![App User Example](/images/app_cli.png)
80+
81+
The app creates a dedicated user account that:
82+
- Handles onboarding, deployment, and demo execution
83+
- Requires specific permissions for UC catalogs and schemas
84+
- Example username format: "app-40zbx9_demo-dltmeta"
85+
86+
### Getting Started
87+
88+
1. Initial Setup:
89+
- Launch app in browser
90+
- Click "Setup dlt-meta project environment"
91+
- This initializes the environment for onboarding and deployment
92+
93+
2. Pipeline Management:
94+
- Use "UI" tab to onboard and deploy pipelines
95+
- Configure pipelines according to your requirements
96+
97+
**Onboarding Pipeline:**
98+
![Onboarding UI](/images/app_onboarding.png)
99+
*Pipeline onboarding interface for configuring new data pipelines*
100+
101+
**Deploying Pipeline:**
102+
![Deploy UI](/images/app_deploy_pipeline.png)
103+
*Pipeline deployment interface for managing and deploying pipelines*
104+
105+
3. Demo Access:
106+
- Available demos can be found under "Demo" tab
107+
- Run pre-configured demo pipelines to explore features
108+
109+
![App Demo](/images/app_run_demos.png)
110+
*Demo interface showing available example pipelines*
111+
112+
4. Command Line Interface:
113+
- Access CLI features under the "CLI" tab
114+
- Execute commands directly from the web interface
115+
116+
![CLI UI](/images/app_cli.png)
117+
*CLI interface for command-line operations*

docs/static/images/app_cli.png

81.5 KB
Loading
75.4 KB
Loading
160 KB
Loading
52.6 KB
Loading

lakehouse_app/app.py

Lines changed: 52 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,6 @@
77
import logging
88
import errno
99
import re
10-
# Use pty to create a pseudo-terminal for better interactive support
1110
import pty
1211
import select
1312
import fcntl
@@ -16,7 +15,6 @@
1615
import signal
1716
import json
1817

19-
# Configure logging
2018
logging.basicConfig(level=logging.INFO,
2119
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
2220
handlers=[logging.FileHandler("dlt-meta-app.log"),
@@ -227,15 +225,15 @@ def start_command():
227225
if 'PYTHONPATH' not in os.environ or not os.path.isdir(os.environ.get('PYTHONPATH', '')):
228226
commands = [
229227
"pip install databricks-cli",
230-
# "git clone https://github.com/databrickslabs/dlt-meta.git",
231-
"git clone https://github.com/dattawalake/dlt-meta.git",
228+
"git clone https://github.com/databrickslabs/dlt-meta.git",
232229
f"python -m venv {current_directory}/dlt-meta/.venv",
233230
f"export HOME={current_directory}",
234231
"cd dlt-meta",
235232
"source .venv/bin/activate",
236233
f"export PYTHONPATH={current_directory}/dlt-meta/",
237234
"pwd",
238235
"pip install databricks-sdk",
236+
"pip install PyYAML",
239237
]
240238
print("Start setting up dlt-meta environment")
241239
for c in commands:
@@ -322,6 +320,7 @@ def handle_onboard_form():
322320
"silver_schema": request.form.get('silver_schema', 'dltmeta_silver_7b4e981029b843c799bf61a0a121b3ca'),
323321
"dlt_meta_layer": request.form.get('dlt_meta_layer', '1'),
324322
"bronze_table": request.form.get('bronze_table', 'bronze_dataflowspec'),
323+
"silver_table": request.form.get('silver_table', 'silver_dataflowspec'),
325324
"overwrite": "1" if request.form.get('overwrite') == "1" else "0",
326325
"version": request.form.get('version', 'v1'),
327326
"environment": request.form.get('environment', 'prod'),
@@ -375,26 +374,67 @@ def handle_deploy_form():
375374
def run_demo():
376375
code_to_run = request.json.get('demo_name', '')
377376
print(f"processing demo for :{request.json}")
378-
current_directory = os.environ['PYTHONPATH'] # os.getcwd()
377+
current_directory = os.environ['PYTHONPATH']
379378
demo_dict = {"demo_cloudfiles": "demo/launch_af_cloudfiles_demo.py",
380379
"demo_acf": "demo/launch_acfs_demo.py",
381380
"demo_silverfanout": "demo/launch_silver_fanout_demo.py",
382-
"demo_dias": "demo/launch_dais_demo.py"
381+
"demo_dias": "demo/launch_dais_demo.py",
382+
"demo_dlt_sink": "demo/launch_dlt_sink_demo.py",
383+
"demo_dabs": "demo/generate_dabs_resources.py"
383384
}
384385
demo_file = demo_dict.get(code_to_run, None)
385386
uc_name = request.json.get('uc_name', '')
386-
result = subprocess.run(f"python {current_directory}/{demo_file} --uc_catalog_name {uc_name} --profile DEFAULT",
387-
shell=True,
388-
capture_output=True,
389-
text=True
390-
)
387+
388+
if code_to_run == 'demo_dabs':
389+
390+
# Step 1: Generate Databricks resources
391+
subprocess.run(f"python {current_directory}/{demo_file} --uc_catalog_name {uc_name} "
392+
f"--source=cloudfiles --profile DEFAULT",
393+
shell=True,
394+
capture_output=True,
395+
text=True
396+
)
397+
398+
# Step 2: Change working directory to demo/dabs for all next commands
399+
subprocess.run("databricks bundle validate --profile=DEFAULT", cwd=f"{current_directory}/demo/dabs",
400+
shell=True,
401+
capture_output=True,
402+
text=True)
403+
404+
# Step 4: Deploy the bundle
405+
subprocess.run("databricks bundle deploy --target dev --profile=DEFAULT",
406+
cwd=f"{current_directory}/demo/dabs", shell=True,
407+
capture_output=True,
408+
text=True)
409+
410+
# Step 5: Run 'onboard_people' task
411+
rs1 = subprocess.run("databricks bundle run onboard_people -t dev --profile=DEFAULT",
412+
cwd=f"{current_directory}/demo/dabs", shell=True,
413+
capture_output=True,
414+
text=True)
415+
print(f"onboarding completed: {rs1.stdout}")
416+
# Step 6: Run 'execute_pipelines_people' task
417+
result = subprocess.run("databricks bundle run execute_pipelines_people -t dev --profile=DEFAULT",
418+
cwd=f"{current_directory}/demo/dabs",
419+
shell=True,
420+
capture_output=True,
421+
text=True
422+
)
423+
print(f"execution of pipeline completed: {result.stdout}")
424+
else:
425+
result = subprocess.run(f"python {current_directory}/{demo_file} --uc_catalog_name {uc_name} "
426+
f"--profile DEFAULT",
427+
shell=True,
428+
capture_output=True,
429+
text=True
430+
)
391431
return extract_command_output(result)
392432

393433

394434
def extract_command_output(result):
395435
stdout = result.stdout
396436
job_id_match = re.search(r"job_id=(\d+) | pipeline=(\d+)", stdout)
397-
url_match = re.search(r"url=(https?://[^\s]+)", stdout)
437+
url_match = re.search(r"(https?://[^\s]+)", stdout)
398438

399439
job_id = job_id_match.group(1) or job_id_match.group(2) if job_id_match else None
400440
job_url = url_match.group(1) if url_match else None

lakehouse_app/templates/landingPage.html

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -655,6 +655,11 @@ <h2 class='step-heading'>Step 1 : Onboarding</h2>
655655
<input id="bronze_table" name="bronze_table" placeholder="Enter bronze table name"
656656
type="text" value="bronze_dataflowspec">
657657
</div>
658+
<div class="form-group">
659+
<label>Provide silver dataflow spec table name:</label>
660+
<input id="silver_table" name="silver_table" placeholder="Enter silver table name"
661+
type="text" value="silver_dataflowspec">
662+
</div>
658663
<div class="form-group">
659664
<label>Overwrite dataflow spec?</label>
660665
<div class="radio-group">
@@ -841,6 +846,8 @@ <h3 class='step-heading'>Available Demos</h3>
841846
<button class="command-button2" data-command="demo_acf">Demo Apply Changes Snapshot</button>
842847
<button class="command-button2" data-command="demo_silverfanout">Demo Silver fanout</button>
843848
<button class="command-button2" data-command="demo_dias">Demo Dias</button>
849+
<button class="command-button2" data-command="demo_dlt_sink">Demo Sink</button>
850+
844851

845852
</div>
846853

@@ -956,7 +963,7 @@ <h5 class="modal-title">Please wait...</h5>
956963
const modalContent = `
957964
<div class="modal-content">
958965
<h3 class='step-heading'>${data.modal_content.title}</h3>
959-
<p>Job ID: ${data.modal_content.job_id}</p>
966+
${data.modal_content.job_id ? `<p>Job ID: ${data.modal_content.job_id}</p>`:""}
960967
961968
<p><a href="${url}" target="_blank">Open Job in Databricks</a></p >
962969
@@ -994,7 +1001,8 @@ <h3 class='step-heading'>${data.modal_content.title}</h3>
9941001
const modalContent = `
9951002
<div class="modal-content">
9961003
<h3 class='step-heading'>${data.modal_content.title}</h3>
997-
<p>Job ID: ${data.modal_content.job_id}</p>
1004+
${data.modal_content.job_id ? `<p>Job ID: ${data.modal_content.job_id}</p>
1005+
`:""}
9981006
9991007
<p><a href="${url}" target="_blank">Open Job in Databricks</a></p >
10001008
@@ -1067,7 +1075,7 @@ <h3 class='step-heading'>${data.modal_content.title}</h3>
10671075
const modalContent = `
10681076
<div class="modal-content">
10691077
<h3 class='step-heading'>${data.modal_content.title}</h3>
1070-
<p>Job ID: ${data.modal_content.job_id}</p>
1078+
${data.modal_content.job_id ? `<p>Job ID: ${data.modal_content.job_id}</p>`:""}
10711079
10721080
<p><a href="${url}" target="_blank">Open Job in Databricks</a></p >
10731081

src/cli.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -638,7 +638,7 @@ def _load_onboard_config_ui(self, form_data) -> OnboardCommand:
638638
onboard_cmd_dict["bronze_dataflowspec_path"] = f'{self._install_folder()}/bronze_dataflow_specs'
639639

640640
if onboard_cmd_dict["onboard_layer"] == "silver" or onboard_cmd_dict["onboard_layer"] == "bronze_silver":
641-
onboard_cmd_dict["silver_dataflowspec_table"] = 'silver_dataflowspec' # Not in form, using default
641+
onboard_cmd_dict["silver_dataflowspec_table"] = form_data.get('silver_table', 'silver_dataflowspec')
642642
if not onboard_cmd_dict["uc_enabled"]:
643643
onboard_cmd_dict["silver_dataflowspec_path"] = f'{self._install_folder()}/silver_dataflow_specs'
644644

0 commit comments

Comments
 (0)