Skip to content

Commit d19f143

Browse files
Make DB Connect work out of the box for unit tests with the default-python template (#3254)
## Changes This update `default-python` to make DB Connect work out of the box: * We now install a conservative 15.x version of DB Connect using uv (14.x is even older and doesn't support serverless) * We validate that the current DB Connect version is supported by the remote cluster / serverless compute. Note that the VS Code extension also does this, providing a warning when users attempt to run against an older cluster. * We make sure there's always a valid SparkSession during test execution (even when users write standard Spark / Spark Declarative Pipelines code with `SparkSessions.builder.getOrCreate()`) * We fall back to serverless compute when the user didn't manually configure a cluster. This also addresses known a current issue with the VS Code extension where it doesn't provide a cluster id when using 'Run' rather than 'Debug' to execute unit tests. * We validate that test dependencies are installed and report a friendly error if they are not. ## Usage 1. Install via `bundle init` ``` lennart:demo$ databricks bundle init Template to use: python-default Welcome to the default Python template for Databricks Asset Bundles! Please provide the following details to tailor the template to your preferences. Unique name for this project [my_project]: my_project Include a stub (sample) notebook in 'my_project/src': yes Include a stub (sample) Delta Live Tables pipeline in 'my_project/src': yes Include a stub (sample) Python package in 'my_project/src': yes Use serverless compute: yes Workspace to use (auto-detected, edit in 'my_project/databricks.yml'): https://[workspace].databricks.com ✨ Your new project has been created in the 'my_project' directory! Please refer to the README.md file for "getting started" instructions. See also the documentation at https://docs.databricks.com/dev-tools/bundles/index.html. lennart:demo$ cd my_project ``` 2. Run tests with `pytest` ``` lennart:my_project$ pytest ImportError while loading conftest '/private/tmp/demp/my_project/conftest.py'. conftest.py:17: in <module> raise ImportError( E ImportError: Test dependencies not found. E E Run tests using 'uv run pytest'. See http://docs.astral.sh/uv to learn move about uv. ``` 3. Actually, Step 2 was wrong. Run tests using `uv`. Or from the VS Code extension. ``` lennart:my_project$ uv run pytest Using CPython 3.13.3 interpreter at: /opt/homebrew/opt/[email protected]/bin/python3.13 Creating virtual environment at: .venv Built my-project @ file:///private/tmp/demo/my_project Installed 32 packages in 235ms ☁️ no compute specified, falling back to serverless compute see https://docs.databricks.com/dev-tools/databricks-connect/cluster-config for manual configuration == test session starts == platform darwin -- Python 3.13.3, pytest-8.4.1, pluggy-1.6.0 rootdir: /private/tmp/demo/my_project configfile: pyproject.toml testpaths: tests collected 1 item tests/main_test.py . [100%] == 1 passed in 1.71s == ``` ## Why Manually setting up testing with DB Connect is still rather hard. With these changes, uv can be used to setup testing automatically. ## Tests * Acceptance tests * Manual tests from CLI, Cursor, different versions of DB Connect <!-- How have you tested the changes? --> <!-- If your PR needs to be included in the release notes for next release, add a separate entry in NEXT_CHANGELOG.md as part of your PR. -->
1 parent 2035807 commit d19f143

File tree

26 files changed

+339
-158
lines changed

26 files changed

+339
-158
lines changed

NEXT_CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
* Upgrade TF provider to 1.88.0 ([#3529](https://github.com/databricks/cli/pull/3529))
1313

1414
### Bundles
15+
* Update default-python template to make DB Connect work out of the box for unit tests, using uv to install dependencies ([#3254](https://github.com/databricks/cli/pull/3254))
1516
* Add support for `TaskRetryMode` for continuous jobs ([#3529](https://github.com/databricks/cli/pull/3529))
1617
* Add support for specifying database instance as an application resource ([#3529](https://github.com/databricks/cli/pull/3529))
1718

acceptance/bundle/templates/default-python/classic/output/my_default_python/README.md

Lines changed: 33 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -2,18 +2,39 @@
22

33
The 'my_default_python' project was generated by using the default-python template.
44

5+
For documentation on the Databricks Asset Bundles format use for this project,
6+
and for CI/CD configuration, see https://docs.databricks.com/aws/en/dev-tools/bundles.
7+
58
## Getting started
69

7-
0. Install UV: https://docs.astral.sh/uv/getting-started/installation/
10+
Choose how you want to work on this project:
11+
12+
(a) Directly in your Databricks workspace, see
13+
https://docs.databricks.com/dev-tools/bundles/workspace.
14+
15+
(b) Locally with an IDE like Cursor or VS Code, see
16+
https://docs.databricks.com/vscode-ext.
17+
18+
(c) With command line tools, see https://docs.databricks.com/dev-tools/cli/databricks-cli.html
19+
20+
21+
Dependencies for this project should be installed using uv:
822

9-
1. Install the Databricks CLI from https://docs.databricks.com/dev-tools/cli/databricks-cli.html
23+
* Make sure you have the UV package manager installed.
24+
It's an alternative to tools like pip: https://docs.astral.sh/uv/getting-started/installation/.
25+
* Run `uv sync --dev` to install the project's dependencies.
1026

11-
2. Authenticate to your Databricks workspace, if you have not done so already:
27+
# Using this project using the CLI
28+
29+
The Databricks workspace and IDE extensions provide a graphical interface for working
30+
with this project. It's also possible to interact with it directly using the CLI:
31+
32+
1. Authenticate to your Databricks workspace, if you have not done so already:
1233
```
1334
$ databricks configure
1435
```
1536
16-
3. To deploy a development copy of this project, type:
37+
2. To deploy a development copy of this project, type:
1738
```
1839
$ databricks bundle deploy --target dev
1940
```
@@ -23,9 +44,9 @@ The 'my_default_python' project was generated by using the default-python templa
2344
This deploys everything that's defined for this project.
2445
For example, the default template would deploy a job called
2546
`[dev yourname] my_default_python_job` to your workspace.
26-
You can find that job by opening your workpace and clicking on **Workflows**.
47+
You can find that job by opening your workpace and clicking on **Jobs & Pipelines**.
2748
28-
4. Similarly, to deploy a production copy, type:
49+
3. Similarly, to deploy a production copy, type:
2950
```
3051
$ databricks bundle deploy --target prod
3152
```
@@ -35,17 +56,12 @@ The 'my_default_python' project was generated by using the default-python templa
3556
is paused when deploying in development mode (see
3657
https://docs.databricks.com/dev-tools/bundles/deployment-modes.html).
3758
38-
5. To run a job or pipeline, use the "run" command:
59+
4. To run a job or pipeline, use the "run" command:
3960
```
4061
$ databricks bundle run
4162
```
42-
6. Optionally, install the Databricks extension for Visual Studio code for local development from
43-
https://docs.databricks.com/dev-tools/vscode-ext.html. It can configure your
44-
virtual environment and setup Databricks Connect for running unit tests locally.
45-
When not using these tools, consult your development environment's documentation
46-
and/or the documentation for Databricks Connect for manually setting up your environment
47-
(https://docs.databricks.com/en/dev-tools/databricks-connect/python/index.html).
48-
49-
7. For documentation on the Databricks asset bundles format used
50-
for this project, and for CI/CD configuration, see
51-
https://docs.databricks.com/dev-tools/bundles/index.html.
63+
64+
5. Finally, to run tests locally, use `pytest`:
65+
```
66+
$ uv run pytest
67+
```

acceptance/bundle/templates/default-python/classic/output/my_default_python/pyproject.toml

Lines changed: 6 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -2,26 +2,20 @@
22
name = "my_default_python"
33
version = "0.0.1"
44
authors = [{ name = "[USERNAME]" }]
5-
requires-python = ">= 3.11"
5+
requires-python = ">=3.10,<=3.13"
66

7-
[project.optional-dependencies]
7+
[dependency-groups]
88
dev = [
99
"pytest",
1010

1111
# Code completion support for Lakeflow Declarative Pipelines, also install databricks-connect
1212
"databricks-dlt",
1313

1414
# databricks-connect can be used to run parts of this project locally.
15-
# See https://docs.databricks.com/dev-tools/databricks-connect.html.
16-
#
17-
# Note, databricks-connect is automatically installed if you're using Databricks
18-
# extension for Visual Studio Code
19-
# (https://docs.databricks.com/dev-tools/vscode-ext/dev-tasks/databricks-connect.html).
20-
#
21-
# To manually install databricks-connect, uncomment the line below to install a version
22-
# of db-connect that corresponds to the Databricks Runtime version used for this project.
23-
# See https://docs.databricks.com/dev-tools/databricks-connect.html
24-
# "databricks-connect>=15.4,<15.5",
15+
# Note that for local development, you should use a version that is not newer
16+
# than the remote cluster or serverless compute you connect to.
17+
# See also https://docs.databricks.com/dev-tools/databricks-connect.html.
18+
"databricks-connect>=15.4,<15.5",
2519
]
2620

2721
[tool.pytest.ini_options]

acceptance/bundle/templates/default-python/classic/output/my_default_python/scratch/exploration.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@
3232
"sys.path.append(\"../src\")\n",
3333
"from my_default_python import main\n",
3434
"\n",
35-
"main.get_taxis(spark).show(10)"
35+
"main.get_taxis().show(10)"
3636
]
3737
}
3838
],

acceptance/bundle/templates/default-python/classic/output/my_default_python/src/my_default_python/main.py

Lines changed: 4 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,13 @@
1-
from pyspark.sql import SparkSession, DataFrame
1+
from databricks.sdk.runtime import spark
2+
from pyspark.sql import DataFrame
23

34

4-
def get_taxis(spark: SparkSession) -> DataFrame:
5+
def find_all_taxis() -> DataFrame:
56
return spark.read.table("samples.nyctaxi.trips")
67

78

8-
# Create a new Databricks Connect session. If this fails,
9-
# check that you have configured Databricks Connect correctly.
10-
# See https://docs.databricks.com/dev-tools/databricks-connect.html.
11-
def get_spark() -> SparkSession:
12-
try:
13-
from databricks.connect import DatabricksSession
14-
15-
return DatabricksSession.builder.getOrCreate()
16-
except ImportError:
17-
return SparkSession.builder.getOrCreate()
18-
19-
209
def main():
21-
get_taxis(get_spark()).show(5)
10+
find_all_taxis().show(5)
2211

2312

2413
if __name__ == "__main__":

acceptance/bundle/templates/default-python/classic/output/my_default_python/src/notebook.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@
4646
"source": [
4747
"from my_default_python import main\n",
4848
"\n",
49-
"main.get_taxis(spark).show(10)"
49+
"main.find_all_taxis().show(10)"
5050
]
5151
}
5252
],

acceptance/bundle/templates/default-python/classic/output/my_default_python/src/pipeline.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@
5656
"source": [
5757
"@dlt.view\n",
5858
"def taxi_raw():\n",
59-
" return main.get_taxis(spark)\n",
59+
" return main.find_all_taxis()\n",
6060
"\n",
6161
"\n",
6262
"@dlt.table\n",
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
"""This file configures pytest."""
2+
3+
import os, sys, pathlib
4+
from contextlib import contextmanager
5+
6+
7+
try:
8+
from databricks.connect import DatabricksSession
9+
from databricks.sdk import WorkspaceClient
10+
from pyspark.sql import SparkSession
11+
import pytest
12+
except ImportError:
13+
raise ImportError("Test dependencies not found.\n\nRun tests using 'uv run pytest'. See http://docs.astral.sh/uv to learn more about uv.")
14+
15+
16+
def enable_fallback_compute():
17+
"""Enable serverless compute if no compute is specified."""
18+
conf = WorkspaceClient().config
19+
if conf.serverless_compute_id or conf.cluster_id or os.environ.get("SPARK_REMOTE"):
20+
return
21+
22+
url = "https://docs.databricks.com/dev-tools/databricks-connect/cluster-config"
23+
print("☁️ no compute specified, falling back to serverless compute", file=sys.stderr)
24+
print(f" see {url} for manual configuration", file=sys.stderr)
25+
26+
os.environ["DATABRICKS_SERVERLESS_COMPUTE_ID"] = "auto"
27+
28+
29+
@contextmanager
30+
def allow_stderr_output(config: pytest.Config):
31+
"""Temporarily disable pytest output capture."""
32+
capman = config.pluginmanager.get_plugin("capturemanager")
33+
if capman:
34+
with capman.global_and_fixture_disabled():
35+
yield
36+
else:
37+
yield
38+
39+
40+
def pytest_configure(config: pytest.Config):
41+
"""Configure pytest session."""
42+
with allow_stderr_output(config):
43+
enable_fallback_compute()
44+
45+
# Initialize Spark session eagerly, so it is available even when
46+
# SparkSession.builder.getOrCreate() is used. For DB Connect 15+,
47+
# we validate version compatibility with the remote cluster.
48+
if hasattr(DatabricksSession.builder, "validateSession"):
49+
DatabricksSession.builder.validateSession().getOrCreate()
50+
else:
51+
DatabricksSession.builder.getOrCreate()
52+
53+
54+
@pytest.fixture(scope="session")
55+
def spark() -> SparkSession:
56+
"""Provide a SparkSession fixture for tests."""
57+
return DatabricksSession.builder.getOrCreate()
Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
from my_default_python.main import get_taxis, get_spark
1+
from my_default_python import main
22

33

4-
def test_main():
5-
taxis = get_taxis(get_spark())
4+
def test_find_all_taxis():
5+
taxis = main.find_all_taxis()
66
assert taxis.count() > 5

acceptance/bundle/templates/default-python/serverless/output/my_default_python/README.md

Lines changed: 33 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -2,18 +2,39 @@
22

33
The 'my_default_python' project was generated by using the default-python template.
44

5+
For documentation on the Databricks Asset Bundles format use for this project,
6+
and for CI/CD configuration, see https://docs.databricks.com/aws/en/dev-tools/bundles.
7+
58
## Getting started
69

7-
0. Install UV: https://docs.astral.sh/uv/getting-started/installation/
10+
Choose how you want to work on this project:
11+
12+
(a) Directly in your Databricks workspace, see
13+
https://docs.databricks.com/dev-tools/bundles/workspace.
14+
15+
(b) Locally with an IDE like Cursor or VS Code, see
16+
https://docs.databricks.com/vscode-ext.
17+
18+
(c) With command line tools, see https://docs.databricks.com/dev-tools/cli/databricks-cli.html
19+
20+
21+
Dependencies for this project should be installed using uv:
822

9-
1. Install the Databricks CLI from https://docs.databricks.com/dev-tools/cli/databricks-cli.html
23+
* Make sure you have the UV package manager installed.
24+
It's an alternative to tools like pip: https://docs.astral.sh/uv/getting-started/installation/.
25+
* Run `uv sync --dev` to install the project's dependencies.
1026

11-
2. Authenticate to your Databricks workspace, if you have not done so already:
27+
# Using this project using the CLI
28+
29+
The Databricks workspace and IDE extensions provide a graphical interface for working
30+
with this project. It's also possible to interact with it directly using the CLI:
31+
32+
1. Authenticate to your Databricks workspace, if you have not done so already:
1233
```
1334
$ databricks configure
1435
```
1536
16-
3. To deploy a development copy of this project, type:
37+
2. To deploy a development copy of this project, type:
1738
```
1839
$ databricks bundle deploy --target dev
1940
```
@@ -23,9 +44,9 @@ The 'my_default_python' project was generated by using the default-python templa
2344
This deploys everything that's defined for this project.
2445
For example, the default template would deploy a job called
2546
`[dev yourname] my_default_python_job` to your workspace.
26-
You can find that job by opening your workpace and clicking on **Workflows**.
47+
You can find that job by opening your workpace and clicking on **Jobs & Pipelines**.
2748
28-
4. Similarly, to deploy a production copy, type:
49+
3. Similarly, to deploy a production copy, type:
2950
```
3051
$ databricks bundle deploy --target prod
3152
```
@@ -35,17 +56,12 @@ The 'my_default_python' project was generated by using the default-python templa
3556
is paused when deploying in development mode (see
3657
https://docs.databricks.com/dev-tools/bundles/deployment-modes.html).
3758
38-
5. To run a job or pipeline, use the "run" command:
59+
4. To run a job or pipeline, use the "run" command:
3960
```
4061
$ databricks bundle run
4162
```
42-
6. Optionally, install the Databricks extension for Visual Studio code for local development from
43-
https://docs.databricks.com/dev-tools/vscode-ext.html. It can configure your
44-
virtual environment and setup Databricks Connect for running unit tests locally.
45-
When not using these tools, consult your development environment's documentation
46-
and/or the documentation for Databricks Connect for manually setting up your environment
47-
(https://docs.databricks.com/en/dev-tools/databricks-connect/python/index.html).
48-
49-
7. For documentation on the Databricks asset bundles format used
50-
for this project, and for CI/CD configuration, see
51-
https://docs.databricks.com/dev-tools/bundles/index.html.
63+
64+
5. Finally, to run tests locally, use `pytest`:
65+
```
66+
$ uv run pytest
67+
```

0 commit comments

Comments
 (0)