Skip to content

Commit eb0df9a

Browse files
lennartkats-dbclaudejuliacrawf-db
authored
Update the default-python template according to Lakeflow conventions (#3712)
## Changes This updates the `default-python` template according to the latest Lakeflow conventions as established in #3671. Notably, the new template moves away from the use of notebooks for pipeline source code. The new layout looks as follows when the user selects they want both the sample job and the sample pipeline: `📁 resources` `├── sample_job.job.yml` `└── sample_etl.pipeline.yml` `📁 src` `├── 📁 my_project` — shared source code for use in jobs and/or pipelines `│ ├── __init__.py` `│ └── main.py` `└── 📁 my_project_etl` — source code for the sample_etl pipeline ` ├── __init__.py` ` ├── 📁 transformations` ` │ ├── __init__.py` ` │ ├── sample_zones_my_project.py` ` │ └── sample_trips_my_project.py` ` ├── 📁 explorations` — exploratory notebooks ` │ ├── __init__.py` ` │ └── sample_exploration.ipynb` ` └── README.md` `📁 tests` — unit tests `📁 fixtures` — fixtures (these can now be used with [`load_fixture`](https://github.com/databricks/cli/blob/af524bb993eaffe059d65f93854d544a162fc6ef/acceptance/bundle/templates/default-python/serverless/output/my_default_python/fixtures/.gitkeep)) `databricks.yml` `pyproject.toml` `README.md` The template prompts have been updated to cater to this structure. Notably, they include a new prompt to manage the catalog and schema used by the template. These settings are propagated to both the job and the pipeline: ``` Welcome to the default Python template for Databricks Asset Bundles! Answer the following questions to customize your project. You can always change your configuration in the databricks.yml file later. Note that https://e2-dogfood.staging.cloud.databricks.com is used for initialization. (For information on how to change your profile, see https://docs.databricks.com/dev-tools/cli/profiles.html.) Unique name for this project [my_project]: my_project Include a Lakeflow job that runs a notebook: yes Include an ETL pipeline: yes Include a sample Python package that builds into a wheel file: yes Use serverless compute: yes Default catalog for any tables created by this project [main]: main Use a personal schema for each user working on this project. (This is recommended. Your personal schema will be 'main.lennart_kats'.): yes ✨ Your new project has been created in the 'my_project' directory! To get started, refer to the project README.md file and the documentation at https://docs.databricks.com/dev-tools/bundles/index.html. ``` ## Testing * Standard unit testing, acceptance testing * AI excercised templates and with all permutations of options, deploying/testing/running/inspecting the result * Bug bash of the original `lakeflow-pipelines` template from #3671 --------- Co-authored-by: Claude <[email protected]> Co-authored-by: Julia Crawford (Databricks) <[email protected]>
1 parent 3d656af commit eb0df9a

File tree

111 files changed

+1729
-1848
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

111 files changed

+1729
-1848
lines changed

NEXT_CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
### Dependency updates
1010

1111
### Bundles
12+
* Updated the default-python template to follow the Lakeflow conventions: pipelines as source files, pyproject.toml ([#3712](https://github.com/databricks/cli/pull/3712)).
1213
* Fix a permissions bug adding second IS\_OWNER and causing "The job must have exactly one owner." error. Introduced in 0.274.0. ([#3850](https://github.com/databricks/cli/pull/3850))
1314

1415
### API Changes
Lines changed: 38 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -1,68 +1,68 @@
11
--- [TESTROOT]/bundle/templates/default-python/classic/../serverless/output/my_default_python/databricks.yml
22
+++ output/my_default_python/databricks.yml
3-
@@ -25,4 +25,11 @@
4-
host: [DATABRICKS_URL]
5-
3+
@@ -34,4 +34,6 @@
4+
catalog: hive_metastore
5+
schema: ${workspace.current_user.short_name}
66
+ presets:
7-
+ # Set dynamic_version: true on all artifacts of type "whl".
8-
+ # This makes "bundle deploy" add a timestamp to wheel's version before uploading,
9-
+ # new wheel takes over the previous installation even if actual wheel version is unchanged.
10-
+ # See https://docs.databricks.com/aws/en/dev-tools/bundles/settings
117
+ artifacts_dynamic_version: true
12-
+
138
prod:
149
mode: production
15-
--- [TESTROOT]/bundle/templates/default-python/classic/../serverless/output/my_default_python/resources/my_default_python.job.yml
16-
+++ output/my_default_python/resources/my_default_python.job.yml
17-
@@ -17,4 +17,5 @@
18-
tasks:
19-
- task_key: notebook_task
20-
+ job_cluster_key: job_cluster
10+
--- [TESTROOT]/bundle/templates/default-python/classic/../serverless/output/my_default_python/resources/my_default_python_etl.pipeline.yml
11+
+++ output/my_default_python/resources/my_default_python_etl.pipeline.yml
12+
@@ -5,8 +5,7 @@
13+
my_default_python_etl:
14+
name: my_default_python_etl
15+
- # Catalog is required for serverless compute
16+
- catalog: main
17+
+ ## Specify the 'catalog' field to configure this pipeline to make use of Unity Catalog:
18+
+ # catalog: ${var.catalog}
19+
schema: ${var.schema}
20+
- serverless: true
21+
root_path: "../src/my_default_python_etl"
22+
23+
--- [TESTROOT]/bundle/templates/default-python/classic/../serverless/output/my_default_python/resources/sample_job.job.yml
24+
+++ output/my_default_python/resources/sample_job.job.yml
25+
@@ -26,4 +26,10 @@
2126
notebook_task:
22-
notebook_path: ../src/notebook.ipynb
23-
@@ -29,17 +30,21 @@
27+
notebook_path: ../src/sample_notebook.ipynb
28+
+ job_cluster_key: job_cluster
29+
+ libraries:
30+
+ # By default we just include the .whl file generated for the my_default_python package.
31+
+ # See https://docs.databricks.com/dev-tools/bundles/library-dependencies.html
32+
+ # for more information on how to add other libraries.
33+
+ - whl: ../dist/*.whl
34+
- task_key: python_wheel_task
2435
depends_on:
25-
- task_key: refresh_pipeline
36+
@@ -37,5 +43,10 @@
37+
- "--schema"
38+
- "${var.schema}"
2639
- environment_key: default
2740
+ job_cluster_key: job_cluster
28-
python_wheel_task:
29-
package_name: my_default_python
30-
entry_point: main
3141
+ libraries:
3242
+ # By default we just include the .whl file generated for the my_default_python package.
3343
+ # See https://docs.databricks.com/dev-tools/bundles/library-dependencies.html
3444
+ # for more information on how to add other libraries.
3545
+ - whl: ../dist/*.whl
46+
- task_key: refresh_pipeline
47+
depends_on:
48+
@@ -44,11 +55,11 @@
49+
pipeline_id: ${resources.pipelines.my_default_python_etl.id}
3650

37-
- # A list of task execution environment specifications that can be referenced by tasks of this job.
3851
- environments:
3952
- - environment_key: default
40-
-
41-
- # Full documentation of this spec can be found at:
42-
- # https://docs.databricks.com/api/workspace/jobs/create#environments-spec
4353
- spec:
4454
- environment_version: "2"
4555
- dependencies:
56+
- # By default we just include the .whl file generated for the my_default_python package.
57+
- # See https://docs.databricks.com/dev-tools/bundles/library-dependencies.html
58+
- # for more information on how to add other libraries.
4659
- - ../dist/*.whl
4760
+ job_clusters:
4861
+ - job_cluster_key: job_cluster
4962
+ new_cluster:
50-
+ spark_version: 15.4.x-scala2.12
63+
+ spark_version: 16.4.x-scala2.12
5164
+ node_type_id: [NODE_TYPE_ID]
5265
+ data_security_mode: SINGLE_USER
5366
+ autoscale:
5467
+ min_workers: 1
5568
+ max_workers: 4
56-
--- [TESTROOT]/bundle/templates/default-python/classic/../serverless/output/my_default_python/resources/my_default_python.pipeline.yml
57-
+++ output/my_default_python/resources/my_default_python.pipeline.yml
58-
@@ -4,8 +4,7 @@
59-
my_default_python_pipeline:
60-
name: my_default_python_pipeline
61-
- ## Catalog is required for serverless compute
62-
- catalog: main
63-
+ ## Specify the 'catalog' field to configure this pipeline to make use of Unity Catalog:
64-
+ # catalog: catalog_name
65-
schema: my_default_python_${bundle.target}
66-
- serverless: true
67-
libraries:
68-
- notebook:

acceptance/bundle/templates/default-python/classic/out.plan_dev.direct.json

Lines changed: 50 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
{
22
"plan": {
3-
"resources.jobs.my_default_python_job": {
3+
"resources.jobs.sample_job": {
44
"depends_on": [
55
{
6-
"node": "resources.pipelines.my_default_python_pipeline",
7-
"label": "${resources.pipelines.my_default_python_pipeline.id}"
6+
"node": "resources.pipelines.my_default_python_etl",
7+
"label": "${resources.pipelines.my_default_python_etl.id}"
88
}
99
],
1010
"action": "create",
@@ -27,43 +27,64 @@
2727
"data_security_mode": "SINGLE_USER",
2828
"node_type_id": "[NODE_TYPE_ID]",
2929
"num_workers": 0,
30-
"spark_version": "15.4.x-scala2.12"
30+
"spark_version": "16.4.x-scala2.12"
3131
}
3232
}
3333
],
3434
"max_concurrent_runs": 4,
35-
"name": "[dev [USERNAME]] my_default_python_job",
35+
"name": "[dev [USERNAME]] sample_job",
36+
"parameters": [
37+
{
38+
"default": "hive_metastore",
39+
"name": "catalog"
40+
},
41+
{
42+
"default": "[USERNAME]",
43+
"name": "schema"
44+
}
45+
],
3646
"queue": {
3747
"enabled": true
3848
},
3949
"tags": {
4050
"dev": "[USERNAME]"
4151
},
4252
"tasks": [
53+
{
54+
"job_cluster_key": "job_cluster",
55+
"libraries": [
56+
{
57+
"whl": "/Workspace/Users/[USERNAME]/.bundle/my_default_python/dev/artifacts/.internal/my_default_python-0.0.1+[UNIX_TIME_NANOS][0]-py3-none-any.whl"
58+
}
59+
],
60+
"notebook_task": {
61+
"notebook_path": "/Workspace/Users/[USERNAME]/.bundle/my_default_python/dev/files/src/sample_notebook"
62+
},
63+
"task_key": "notebook_task"
64+
},
4365
{
4466
"depends_on": [
4567
{
46-
"task_key": "refresh_pipeline"
68+
"task_key": "notebook_task"
4769
}
4870
],
4971
"job_cluster_key": "job_cluster",
5072
"libraries": [
5173
{
52-
"whl": "/Workspace/Users/[USERNAME]/.bundle/my_default_python/dev/artifacts/.internal/my_default_python-0.0.1+[UNIX_TIME_NANOS]-py3-none-any.whl"
74+
"whl": "/Workspace/Users/[USERNAME]/.bundle/my_default_python/dev/artifacts/.internal/my_default_python-0.0.1+[UNIX_TIME_NANOS][0]-py3-none-any.whl"
5375
}
5476
],
5577
"python_wheel_task": {
5678
"entry_point": "main",
57-
"package_name": "my_default_python"
58-
},
59-
"task_key": "main_task"
60-
},
61-
{
62-
"job_cluster_key": "job_cluster",
63-
"notebook_task": {
64-
"notebook_path": "/Workspace/Users/[USERNAME]/.bundle/my_default_python/dev/files/src/notebook"
79+
"package_name": "my_default_python",
80+
"parameters": [
81+
"--catalog",
82+
"hive_metastore",
83+
"--schema",
84+
"[USERNAME]"
85+
]
6586
},
66-
"task_key": "notebook_task"
87+
"task_key": "python_wheel_task"
6788
},
6889
{
6990
"depends_on": [
@@ -72,7 +93,7 @@
7293
}
7394
],
7495
"pipeline_task": {
75-
"pipeline_id": "${resources.pipelines.my_default_python_pipeline.id}"
96+
"pipeline_id": "${resources.pipelines.my_default_python_etl.id}"
7697
},
7798
"task_key": "refresh_pipeline"
7899
}
@@ -86,33 +107,36 @@
86107
}
87108
},
88109
"vars": {
89-
"tasks[2].pipeline_task.pipeline_id": "${resources.pipelines.my_default_python_pipeline.id}"
110+
"tasks[2].pipeline_task.pipeline_id": "${resources.pipelines.my_default_python_etl.id}"
90111
}
91112
}
92113
},
93-
"resources.pipelines.my_default_python_pipeline": {
114+
"resources.pipelines.my_default_python_etl": {
94115
"action": "create",
95116
"new_state": {
96117
"config": {
97118
"channel": "CURRENT",
98-
"configuration": {
99-
"bundle.sourcePath": "/Workspace/Users/[USERNAME]/.bundle/my_default_python/dev/files/src"
100-
},
101119
"deployment": {
102120
"kind": "BUNDLE",
103121
"metadata_file_path": "/Workspace/Users/[USERNAME]/.bundle/my_default_python/dev/state/metadata.json"
104122
},
105123
"development": true,
106124
"edition": "ADVANCED",
125+
"environment": {
126+
"dependencies": [
127+
"--editable /Workspace/Users/[USERNAME]/.bundle/my_default_python/dev/files"
128+
]
129+
},
107130
"libraries": [
108131
{
109-
"notebook": {
110-
"path": "/Workspace/Users/[USERNAME]/.bundle/my_default_python/dev/files/src/pipeline"
132+
"glob": {
133+
"include": "/Workspace/Users/[USERNAME]/.bundle/my_default_python/dev/files/src/my_default_python_etl/transformations/**"
111134
}
112135
}
113136
],
114-
"name": "[dev [USERNAME]] my_default_python_pipeline",
115-
"schema": "my_default_python_dev",
137+
"name": "[dev [USERNAME]] my_default_python_etl",
138+
"root_path": "/Workspace/Users/[USERNAME]/.bundle/my_default_python/dev/files/src/my_default_python_etl",
139+
"schema": "[USERNAME]",
116140
"tags": {
117141
"dev": "[USERNAME]"
118142
}

acceptance/bundle/templates/default-python/classic/out.plan_dev.terraform.json

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
{
22
"plan": {
3-
"resources.jobs.my_default_python_job": {
3+
"resources.jobs.sample_job": {
44
"action": "create"
55
},
6-
"resources.pipelines.my_default_python_pipeline": {
6+
"resources.pipelines.my_default_python_etl": {
77
"action": "create"
88
}
99
}

0 commit comments

Comments
 (0)