Skip to content

Commit c30c456

Browse files
Prepare a new "generic" template, use "src layout" for Lakeflow template (#3671)
## Changes This PR prepares a generic `default` template that I want to use as the basis for `default-python`, `lakeflow-pipelines`, and (likely) `default-sql`: * This template revises and replaces `lakeflow-pipelines` * The template now uses an "src" layout with job/pipeline environments pointing to the directory of the pyproject.toml file * Jobs in this template now pass the `catalog` and `schema` parameters since we ask a question about catalog/schema in the template To support the notion of a "generic" template, the template schema format now supports a `template_dir` argument. This allows us to have multiple `databricks_template_schema.json` files that point to one template directory. Out of scope: this PR does not yet update `default-python`. For early testing purposes, an early version is available as `experimental-default-python`. To keep the diff cleaner, I removed acceptance tests for this template; that's something for a follow-up PR. ## Why * We want to follow the Lakeflow conventions in templates * Our templates are hard to maintain and have many inconsistencies. I'd like to move to one shared template for at least default-python and the Lakeflow template. ## Tests * Standard template testing methodology. <!-- If your PR needs to be included in the release notes for next release, add a separate entry in NEXT_CHANGELOG.md as part of your PR. --> --------- Co-authored-by: Claude <[email protected]>
1 parent bd47132 commit c30c456

File tree

59 files changed

+500
-342
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

59 files changed

+500
-342
lines changed

NEXT_CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,5 +11,6 @@
1111
### Dependency updates
1212

1313
### Bundles
14+
* Updated the internal lakeflow-pipelines template to use an "src" layout ([#3671](https://github.com/databricks/cli/pull/3671)).
1415

1516
### API Changes

acceptance/bundle/templates/lakeflow-pipelines/python/output/my_lakeflow_pipelines/README.md

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,8 @@
11
# my_lakeflow_pipelines
22

3-
The 'my_lakeflow_pipelines' project was generated by using the Lakeflow Pipelines template.
3+
The 'my_lakeflow_pipelines' project was generated by using the default template.
44

5-
* `lib/`: Python source code for this project.
6-
* `lib/shared`: Shared source code across all jobs/pipelines/etc.
7-
* `resources/lakeflow_pipelines_etl`: Pipeline code and assets for the lakeflow_pipelines_etl pipeline.
5+
* `src/`: Python source code for this project.
86
* `resources/`: Resource configurations (jobs, pipelines, etc.)
97

108
## Getting started
@@ -46,7 +44,7 @@ with this project. It's also possible to interact with it directly using the CLI
4644
$ databricks bundle deploy --target prod
4745
```
4846
Note the default template has a includes a job that runs the pipeline every day
49-
(defined in resources/lakeflow_pipelines_etl/lakeflow_pipelines_job.job.yml). The schedule
47+
(defined in resources/sample_job.job.yml). The schedule
5048
is paused when deploying in development mode (see
5149
https://docs.databricks.com/dev-tools/bundles/deployment-modes.html).
5250
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
[project]
2+
name = "my_lakeflow_pipelines"
3+
version = "0.0.1"
4+
authors = [{ name = "[USERNAME]" }]
5+
requires-python = ">=3.10,<=3.13"
6+
dependencies = [
7+
# Any dependencies for jobs and pipelines in this project can be added here
8+
# See also https://docs.databricks.com/dev-tools/bundles/library-dependencies
9+
#
10+
# LIMITATION: for pipelines, dependencies are cached during development;
11+
# add dependencies to the 'environment' section of pipeline.yml file instead
12+
]
13+
14+
[dependency-groups]
15+
dev = [
16+
"pytest",
17+
"databricks-dlt",
18+
"databricks-connect>=15.4,<15.5",
19+
]
20+
21+
[project.scripts]
22+
main = "my_lakeflow_pipelines.main:main"
23+
24+
[build-system]
25+
requires = ["hatchling"]
26+
build-backend = "hatchling.build"
27+
28+
[tool.black]
29+
line-length = 125
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# The main pipeline for my_lakeflow_pipelines
2+
3+
resources:
4+
pipelines:
5+
lakeflow_pipelines_etl:
6+
name: lakeflow_pipelines_etl
7+
## Catalog is required for serverless compute
8+
catalog: ${var.catalog}
9+
schema: ${var.schema}
10+
serverless: true
11+
root_path: "../src/lakeflow_pipelines_etl"
12+
13+
libraries:
14+
- glob:
15+
include: ../src/lakeflow_pipelines_etl/transformations/**
16+
17+
environment:
18+
dependencies:
19+
# We include every dependency defined by pyproject.toml by defining an editable environment
20+
# that points to the folder where pyproject.toml is deployed.
21+
- --editable ${workspace.file_path}

acceptance/bundle/templates/lakeflow-pipelines/python/output/my_lakeflow_pipelines/resources/lakeflow_pipelines_etl/lakeflow_pipelines_etl.pipeline.yml

Lines changed: 0 additions & 15 deletions
This file was deleted.

acceptance/bundle/templates/lakeflow-pipelines/python/output/my_lakeflow_pipelines/resources/lakeflow_pipelines_etl/lakeflow_pipelines_job.job.yml renamed to acceptance/bundle/templates/lakeflow-pipelines/python/output/my_lakeflow_pipelines/resources/sample_job.job.yml

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
1-
# The job that triggers lakeflow_pipelines_etl.
1+
# A sample job for my_lakeflow_pipelines.
22

33
resources:
44
jobs:
5-
lakeflow_pipelines_job:
6-
name: lakeflow_pipelines_job
5+
sample_job:
6+
name: sample_job
77

88
trigger:
99
# Run this job every day, exactly one day from the last run; see https://docs.databricks.com/api/workspace/jobs/create#trigger
@@ -15,7 +15,18 @@ resources:
1515
# on_failure:
1616
1717

18+
parameters:
19+
- name: catalog
20+
default: ${var.catalog}
21+
- name: schema
22+
default: ${var.schema}
23+
1824
tasks:
1925
- task_key: refresh_pipeline
2026
pipeline_task:
2127
pipeline_id: ${resources.pipelines.lakeflow_pipelines_etl.id}
28+
29+
environments:
30+
- environment_key: default
31+
spec:
32+
environment_version: "2"

acceptance/bundle/templates/lakeflow-pipelines/python/output/my_lakeflow_pipelines/resources/lakeflow_pipelines_etl/README.md renamed to acceptance/bundle/templates/lakeflow-pipelines/python/output/my_lakeflow_pipelines/src/lakeflow_pipelines_etl/README.md

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -12,11 +12,9 @@ This folder defines all source code for the my_lakeflow_pipelines pipeline:
1212
To get started, go to the `transformations` folder -- most of the relevant source code lives there:
1313

1414
* By convention, every dataset under `transformations` is in a separate file.
15-
* Take a look at the sample under "sample_trips_my_lakeflow_pipelines.py" to get familiar with the syntax.
15+
* Take a look at the sample called "sample_trips_my_lakeflow_pipelines.py" to get familiar with the syntax.
1616
Read more about the syntax at https://docs.databricks.com/dlt/python-ref.html.
17-
* Use `Run file` to run and preview a single transformation.
18-
* Use `Run pipeline` to run _all_ transformations in the entire pipeline.
19-
* Use `+ Add` in the file browser to add a new data set definition.
20-
* Use `Schedule` to run the pipeline on a schedule!
17+
* If you're using the workspace UI, use `Run file` to run and preview a single transformation.
18+
* If you're using the CLI, use `databricks bundle run lakeflow_pipelines_etl --select sample_trips_my_lakeflow_pipelines` to run a single transformation.
2119

2220
For more tutorials and reference material, see https://docs.databricks.com/dlt.

acceptance/bundle/templates/lakeflow-pipelines/python/output/my_lakeflow_pipelines/resources/lakeflow_pipelines_etl/explorations/sample_exploration.ipynb renamed to acceptance/bundle/templates/lakeflow-pipelines/python/output/my_lakeflow_pipelines/src/lakeflow_pipelines_etl/explorations/sample_exploration.ipynb

File renamed without changes.

acceptance/bundle/templates/lakeflow-pipelines/python/output/my_lakeflow_pipelines/resources/lakeflow_pipelines_etl/transformations/sample_trips_my_lakeflow_pipelines.py renamed to acceptance/bundle/templates/lakeflow-pipelines/python/output/my_lakeflow_pipelines/src/lakeflow_pipelines_etl/transformations/sample_trips_my_lakeflow_pipelines.py

File renamed without changes.

acceptance/bundle/templates/lakeflow-pipelines/python/output/my_lakeflow_pipelines/resources/lakeflow_pipelines_etl/transformations/sample_zones_my_lakeflow_pipelines.py renamed to acceptance/bundle/templates/lakeflow-pipelines/python/output/my_lakeflow_pipelines/src/lakeflow_pipelines_etl/transformations/sample_zones_my_lakeflow_pipelines.py

File renamed without changes.

0 commit comments

Comments
 (0)