Skip to content

Commit b18a35b

Browse files
authored
Strip absolute paths config sync (#4492)
## Changes <!-- Brief summary of your changes that is easy to understand --> Strip bundle path prefixes from absolute paths like `/Workspace/path/to/bundle/notebook.py` Covers special cases: 1. notebook extension is restored 2. path is also translated to be relative to the file where the field is defined ## Why When a user adds a new task and selects a notebook that is part of the bundle, we should detect it and convert to relative path to make bundle portable ## Tests <!-- How have you tested the changes? --> Added acceptance tests Integration test failures are not related <!-- If your PR needs to be included in the release notes for next release, add a separate entry in NEXT_CHANGELOG.md as part of your PR. -->
1 parent 39c5094 commit b18a35b

File tree

15 files changed

+214
-19
lines changed

15 files changed

+214
-19
lines changed

acceptance/bundle/config-remote-sync/job_multiple_tasks/output.txt

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,15 +69,17 @@ Deploying resources...
6969
Updating deployment state...
7070
Deployment complete!
7171

72-
=== Rename b_task to b_task_renamed (4 tasks, 2 with depends_on, 1 without)
72+
=== Rename b_task, replace a_task notebook_path, add synced_task
7373
=== Detect task key rename
7474
Detected changes in 1 resource(s):
7575

7676
Resource: resources.jobs.rename_task_job
77+
tasks[task_key='a_task'].notebook_task.notebook_path: replace
7778
tasks[task_key='b_task']: remove
7879
tasks[task_key='b_task_renamed']: add
7980
tasks[task_key='c_task'].depends_on[0].task_key: replace
8081
tasks[task_key='d_task'].depends_on[0].task_key: replace
82+
tasks[task_key='synced_task']: add
8183

8284

8385

@@ -114,6 +116,22 @@ Resource: resources.jobs.rename_task_job
114116
+ - task_key: b_task_renamed
115117
notebook_task:
116118
notebook_path: /Users/{{workspace_user_name}}/c_task
119+
@@ -67,7 +67,14 @@
120+
- task_key: a_task
121+
notebook_task:
122+
- notebook_path: /Users/{{workspace_user_name}}/a_task
123+
+ notebook_path: ./synced_notebook.py
124+
new_cluster:
125+
spark_version: 13.3.x-snapshot-scala2.12
126+
node_type_id: [NODE_TYPE_ID]
127+
num_workers: 1
128+
+ - new_cluster:
129+
+ node_type_id: [NODE_TYPE_ID]
130+
+ num_workers: 1
131+
+ spark_version: 13.3.x-snapshot-scala2.12
132+
+ notebook_task:
133+
+ notebook_path: ./synced_notebook.py
134+
+ task_key: synced_task
117135

118136
>>> [CLI] bundle destroy --auto-approve
119137
The following resources will be deleted:

acceptance/bundle/config-remote-sync/job_multiple_tasks/script

Lines changed: 20 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -52,16 +52,34 @@ mv databricks.yml.resolved databricks.yml
5252
# Deploy the updated configuration to sync state
5353
$CLI bundle deploy
5454

55-
title "Rename b_task to b_task_renamed (4 tasks, 2 with depends_on, 1 without)"
55+
title "Rename b_task, replace a_task notebook_path, add synced_task"
5656
rename_job_id="$(read_id.py rename_task_job)"
57-
edit_resource.py jobs $rename_job_id <<'EOF'
57+
edit_resource.py jobs $rename_job_id <<EOF
5858
for task in r["tasks"]:
5959
if task["task_key"] == "b_task":
6060
task["task_key"] = "b_task_renamed"
6161
if "depends_on" in task:
6262
for dep in task["depends_on"]:
6363
if dep["task_key"] == "b_task":
6464
dep["task_key"] = "b_task_renamed"
65+
66+
# Replace a_task's notebook_path with sync-root path (tests Replace operation)
67+
for task in r["tasks"]:
68+
if task["task_key"] == "a_task":
69+
task["notebook_task"]["notebook_path"] = "${PWD}/synced_notebook"
70+
71+
# Add synced_task with path inside sync root
72+
r["tasks"].append({
73+
"task_key": "synced_task",
74+
"notebook_task": {
75+
"notebook_path": "${PWD}/synced_notebook"
76+
},
77+
"new_cluster": {
78+
"spark_version": "${DEFAULT_SPARK_VERSION}",
79+
"node_type_id": "${NODE_TYPE_ID}",
80+
"num_workers": 1
81+
}
82+
})
6583
EOF
6684

6785
title "Detect task key rename"
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
# Databricks notebook source

acceptance/bundle/config-remote-sync/job_multiple_tasks/test.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
Cloud = true
22

33
RecordRequests = false
4-
Ignore = [".databricks", "dummy.whl", "databricks.yml", "databricks.yml.backup"]
4+
Ignore = [".databricks", "dummy.whl", "databricks.yml", "databricks.yml.backup", "synced_notebook.py"]
55

66
[Env]
77
DATABRICKS_BUNDLE_ENABLE_EXPERIMENTAL_YAML_SYNC = "true"

acceptance/bundle/config-remote-sync/multiple_files/output.txt

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ Updating deployment state...
44
Deployment complete!
55

66
=== Modify job_one: max_concurrent_runs, rename c_task
7+
=== Add synced_task to job_one (notebook in sync root, resource in subdirectory)
78
=== Modify job_two: max_concurrent_runs, rename first task, add extra_task
89
=== Add extra_task to local config (not in saved state, triggers entity-level replace)
910
=== Detect and save changes
@@ -14,6 +15,7 @@ Resource: resources.jobs.job_one
1415
tasks[task_key='a_task'].depends_on[0].task_key: replace
1516
tasks[task_key='c_task']: remove
1617
tasks[task_key='c_task_renamed']: add
18+
tasks[task_key='synced_task']: add
1719

1820
Resource: resources.jobs.job_two
1921
max_concurrent_runs: replace
@@ -50,11 +52,18 @@ Resource: resources.jobs.job_two
5052
+ task_key: c_task_renamed
5153
- task_key: a_task
5254
notebook_task:
53-
@@ -21,3 +21,3 @@
55+
@@ -21,3 +21,10 @@
5456
num_workers: 1
5557
depends_on:
5658
- - task_key: c_task
5759
+ - task_key: c_task_renamed
60+
+ - new_cluster:
61+
+ node_type_id: [NODE_TYPE_ID]
62+
+ num_workers: 1
63+
+ spark_version: 13.3.x-snapshot-scala2.12
64+
+ notebook_task:
65+
+ notebook_path: ../sample_exploration.ipynb
66+
+ task_key: synced_task
5867

5968
=== Changes in job2.yml
6069

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {
6+
"application/vnd.databricks.v1+cell": {
7+
"cellMetadata": {},
8+
"inputWidgets": {},
9+
"nuid": "[UUID]",
10+
"showTitle": false,
11+
"tableResultSettingsMap": {},
12+
"title": ""
13+
}
14+
},
15+
"source": [
16+
"### Example Exploratory Notebook\n",
17+
"\n",
18+
"Use this notebook to explore the data generated by the pipeline in your preferred programming language.\n",
19+
"\n",
20+
"**Note**: This notebook is not executed as part of the pipeline."
21+
]
22+
},
23+
{
24+
"cell_type": "code",
25+
"execution_count": 0,
26+
"metadata": {
27+
"application/vnd.databricks.v1+cell": {
28+
"cellMetadata": {},
29+
"inputWidgets": {},
30+
"nuid": "[UUID]",
31+
"showTitle": false,
32+
"tableResultSettingsMap": {},
33+
"title": ""
34+
}
35+
},
36+
"outputs": [],
37+
"source": [
38+
"# !!! Before performing any data analysis, make sure to run the pipeline to materialize the sample datasets. The tables referenced in this notebook depend on that step.\n",
39+
"\n",
40+
"display(spark.sql(\"SELECT * FROM hive_metastore.[USERNAME].sample_trips_lakeflow_project\"))"
41+
]
42+
}
43+
],
44+
"metadata": {
45+
"application/vnd.databricks.v1+notebook": {
46+
"computePreferences": null,
47+
"dashboards": [],
48+
"environmentMetadata": null,
49+
"inputWidgetPreferences": null,
50+
"language": "python",
51+
"notebookMetadata": {
52+
"pythonIndentUnit": 2
53+
},
54+
"notebookName": "sample_exploration",
55+
"widgets": {}
56+
},
57+
"language_info": {
58+
"name": "python"
59+
}
60+
},
61+
"nbformat": 4,
62+
"nbformat_minor": 0
63+
}

acceptance/bundle/config-remote-sync/multiple_files/script

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,21 @@ for task in r["tasks"]:
2727
dep["task_key"] = "c_task_renamed"
2828
EOF
2929

30+
title "Add synced_task to job_one (notebook in sync root, resource in subdirectory)"
31+
edit_resource.py jobs $job_one_id <<EOF
32+
r["tasks"].append({
33+
"task_key": "synced_task",
34+
"notebook_task": {
35+
"notebook_path": "${PWD}/sample_exploration"
36+
},
37+
"new_cluster": {
38+
"spark_version": "${DEFAULT_SPARK_VERSION}",
39+
"node_type_id": "${NODE_TYPE_ID}",
40+
"num_workers": 1
41+
}
42+
})
43+
EOF
44+
3045
title "Modify job_two: max_concurrent_runs, rename first task, add extra_task"
3146
edit_resource.py jobs $job_two_id <<'EOF'
3247
r["max_concurrent_runs"] = 10

acceptance/bundle/config-remote-sync/multiple_files/test.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
Cloud = true
22

33
RecordRequests = false
4-
Ignore = [".databricks", "dummy.whl", "databricks.yml", "resources/job1.yml", "resources/job2.yml"]
4+
Ignore = [".databricks", "dummy.whl", "databricks.yml", "resources/job1.yml", "resources/job2.yml", "sample_exploration.ipynb"]
55

66
[Env]
77
DATABRICKS_BUNDLE_ENABLE_EXPERIMENTAL_YAML_SYNC = "true"

acceptance/bundle/config-remote-sync/pipeline_fields/output.txt

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ Resource: resources.pipelines.my_pipeline
1212
environment.dependencies: replace
1313
notifications[0].alerts: replace
1414
notifications[0].email_recipients: replace
15+
root_path: add
1516
schema: replace
1617
tags['foo']: add
1718

@@ -28,7 +29,7 @@ Resource: resources.pipelines.my_pipeline
2829
-
2930
resources:
3031
pipelines:
31-
@@ -7,19 +6,24 @@
32+
@@ -7,19 +6,25 @@
3233
name: test-pipeline-[UNIQUE_NAME]
3334
catalog: main
3435
- schema: default
@@ -53,6 +54,7 @@ Resource: resources.pipelines.my_pipeline
5354
-
5455
+ tags:
5556
+ foo: bar
57+
+ root_path: ./pipeline_root
5658
targets:
5759
default:
5860

acceptance/bundle/config-remote-sync/pipeline_fields/pipeline_root/.gitkeep

Whitespace-only changes.

0 commit comments

Comments
 (0)