Skip to content
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
71c623e
Add basic configuration
lennartkats-db Jul 14, 2025
4fbf017
Cleanup test
lennartkats-db Jul 14, 2025
4734cbe
Add conftest
lennartkats-db Jul 14, 2025
58f043e
Update references
lennartkats-db Jul 14, 2025
103e008
Add pytest to instructions
lennartkats-db Jul 14, 2025
cb8e2f8
Fix main
lennartkats-db Jul 14, 2025
13978dd
Avoid auth login for now
lennartkats-db Jul 14, 2025
154bb09
Make sure Spark session is initialized eagerly
lennartkats-db Jul 15, 2025
4103dad
Update acceptance tests
lennartkats-db Jul 15, 2025
bad1412
Fix formatting
lennartkats-db Jul 15, 2025
7ae4e5a
Remove PyCharm mention
lennartkats-db Jul 25, 2025
3090e8d
Minor tweaks
lennartkats-db Jul 25, 2025
ae63222
Update tests
lennartkats-db Aug 1, 2025
439ea2c
Change UV to uv
lennartkats-db Aug 1, 2025
5685373
Update Python version spec
lennartkats-db Aug 1, 2025
0cce9cd
Merge remote-tracking branch 'origin/main' into uv-db-connect
lennartkats-db Aug 22, 2025
957824b
Fix aceptance test
lennartkats-db Aug 22, 2025
d0f6537
Add to changelog
lennartkats-db Jul 15, 2025
56cae98
Merge remote-tracking branch 'origin/main' into uv-db-connect
lennartkats-db Aug 27, 2025
b7ef6b1
Move conftest into tests/ directory for now
lennartkats-db Aug 30, 2025
dbee730
Update comment
lennartkats-db Aug 30, 2025
ae92d07
Update test files
lennartkats-db Sep 1, 2025
f822242
Revert apps acceptance test changes
lennartkats-db Sep 2, 2025
459cf10
Merge remote-tracking branch 'origin/main' into uv-db-connect
lennartkats-db Sep 2, 2025
0f82370
Fix output
lennartkats-db Sep 2, 2025
0403440
Merge remote-tracking branch 'origin/main' into uv-db-connect
lennartkats-db Sep 2, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,37 @@

The 'my_default_python' project was generated by using the default-python template.

For documentation on the Databricks asset bundles format use for this project,
and for CI/CD configuration, see https://docs.databricks.com/aws/en/dev-tools/bundles.

## Getting started

0. Install UV: https://docs.astral.sh/uv/getting-started/installation/
Choose how you want to work on this project:

(a) Directly in your Databricks workspace, see
https://docs.databricks.com/dev-tools/bundles/workspace.

(b) Locally with an IDE like Cursor, VS Code, or PyCharm, see
https://docs.databricks.com/vscode-ext and https://www.databricks.com/blog/announcing-pycharm-integration-databricks.

(c) With command line tools, see https://docs.databricks.com/dev-tools/cli/databricks-cli.html
Dependencies for this project should be installed using UV:

* Make sure you have the UV package manager installed.
It's an alternative to tools like pip: https://docs.astral.sh/uv/getting-started/installation/.
* Run `uv sync --dev` to install the project's dependencies.

1. Install the Databricks CLI from https://docs.databricks.com/dev-tools/cli/databricks-cli.html
# Using this project using the CLI

2. Authenticate to your Databricks workspace, if you have not done so already:
The Databricks workspace and IDE extensions provide a graphical interface for working
with this project. It's also possible to interact with it directly using the CLI:

1. Authenticate to your Databricks workspace, if you have not done so already:
```
$ databricks configure
```

3. To deploy a development copy of this project, type:
2. To deploy a development copy of this project, type:
```
$ databricks bundle deploy --target dev
```
Expand All @@ -23,9 +42,9 @@ The 'my_default_python' project was generated by using the default-python templa
This deploys everything that's defined for this project.
For example, the default template would deploy a job called
`[dev yourname] my_default_python_job` to your workspace.
You can find that job by opening your workpace and clicking on **Workflows**.
You can find that job by opening your workpace and clicking on **Jobs & Pipelines**.

4. Similarly, to deploy a production copy, type:
3. Similarly, to deploy a production copy, type:
```
$ databricks bundle deploy --target prod
```
Expand All @@ -35,17 +54,12 @@ The 'my_default_python' project was generated by using the default-python templa
is paused when deploying in development mode (see
https://docs.databricks.com/dev-tools/bundles/deployment-modes.html).

5. To run a job or pipeline, use the "run" command:
4. To run a job or pipeline, use the "run" command:
```
$ databricks bundle run
```
6. Optionally, install the Databricks extension for Visual Studio code for local development from
https://docs.databricks.com/dev-tools/vscode-ext.html. It can configure your
virtual environment and setup Databricks Connect for running unit tests locally.
When not using these tools, consult your development environment's documentation
and/or the documentation for Databricks Connect for manually setting up your environment
(https://docs.databricks.com/en/dev-tools/databricks-connect/python/index.html).

7. For documentation on the Databricks asset bundles format used
for this project, and for CI/CD configuration, see
https://docs.databricks.com/dev-tools/bundles/index.html.

5. Finally, to run tests locally, use `pytest`:
```
$ uv run pytest
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
"""This file configures pytest.

This file is in the root since it can be used for tests in any place in this
project, including tests under resources/.
"""

import os, sys, pathlib
from contextlib import contextmanager


try:
from databricks.connect import DatabricksSession
from databricks.sdk import WorkspaceClient
from pyspark.sql import SparkSession
import pytest
except ImportError:
raise ImportError("Test dependencies not found.\n\nRun tests using 'uv run pytest'. See http://docs.astral.sh/uv to learn more about uv.")


def add_all_resources_to_sys_path():
"""Add all resources/* directories to sys.path for module discovery."""
resources = pathlib.Path(__file__).with_name("resources")
resource_dirs = filter(pathlib.Path.is_dir, resources.iterdir())
seen: dict[str, pathlib.Path] = {}
for resource in resource_dirs:
sys.path.append(str(resource.resolve()))
for py in resource.rglob("*.py"):
mod = ".".join(py.relative_to(resource).with_suffix("").parts)
if mod in seen:
raise ImportError(f"Duplicate module '{mod}' found:\n {seen[mod]}\n {py}")
seen[mod] = py


def enable_fallback_compute():
"""Enable serverless compute if no compute is specified."""
conf = WorkspaceClient().config
if conf.serverless_compute_id or conf.cluster_id or os.environ.get("SPARK_REMOTE"):
return

url = "https://docs.databricks.com/dev-tools/databricks-connect/cluster-config"
print("☁️ no compute specified, falling back to serverless compute", file=sys.stderr)
print(f" see {url} for manual configuration", file=sys.stdout)

os.environ["DATABRICKS_SERVERLESS_COMPUTE_ID"] = "auto"


@contextmanager
def allow_stderr_output(config: pytest.Config):
"""Temporarily disable pytest output capture."""
capman = config.pluginmanager.get_plugin("capturemanager")
if capman:
with capman.global_and_fixture_disabled():
yield
else:
yield


def pytest_configure(config: pytest.Config):
"""Configure pytest session."""
with allow_stderr_output(config):
add_all_resources_to_sys_path()
enable_fallback_compute()

# Initialize Spark session eagerly, so it is available even when
# SparkSession.builder.getOrCreate() is used. For DB Connect 15+,
# we validate version compatibility with the remote cluster.
if hasattr(DatabricksSession.builder, "validateSession"):
DatabricksSession.builder.validateSession().getOrCreate()
else:
DatabricksSession.builder.getOrCreate()


@pytest.fixture(scope="session")
def spark() -> SparkSession:
"""Provide a SparkSession fixture for tests."""
return DatabricksSession.builder.getOrCreate()
Original file line number Diff line number Diff line change
Expand Up @@ -4,24 +4,18 @@ version = "0.0.1"
authors = [{ name = "[USERNAME]" }]
requires-python = ">= 3.11"

[project.optional-dependencies]
[dependency-groups]
dev = [
"pytest",

# Code completion support for DLT, also install databricks-connect
"databricks-dlt",

# databricks-connect can be used to run parts of this project locally.
# See https://docs.databricks.com/dev-tools/databricks-connect.html.
#
# Note, databricks-connect is automatically installed if you're using Databricks
# extension for Visual Studio Code
# (https://docs.databricks.com/dev-tools/vscode-ext/dev-tasks/databricks-connect.html).
#
# To manually install databricks-connect, uncomment the line below to install a version
# of db-connect that corresponds to the Databricks Runtime version used for this project.
# See https://docs.databricks.com/dev-tools/databricks-connect.html
# "databricks-connect>=15.4,<15.5",
# Note that for local development, you should use a version that is not newer
# than the remote cluster or serverless compute you connect to.
# See also https://docs.databricks.com/dev-tools/databricks-connect.html.
"databricks-connect>=15.4,<15.5",
]

[tool.pytest.ini_options]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
"sys.path.append(\"../src\")\n",
"from my_default_python import main\n",
"\n",
"main.get_taxis(spark).show(10)"
"main.get_taxis().show(10)"
]
}
],
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@
"source": [
"@dlt.view\n",
"def taxi_raw():\n",
" return main.get_taxis(spark)\n",
" return main.find_all_taxis()\n",
"\n",
"\n",
"@dlt.table\n",
Expand Down
Original file line number Diff line number Diff line change
@@ -1,24 +1,13 @@
from pyspark.sql import SparkSession, DataFrame
from databricks.sdk.runtime import spark
from pyspark.sql import DataFrame


def get_taxis(spark: SparkSession) -> DataFrame:
def find_all_taxis() -> DataFrame:
return spark.read.table("samples.nyctaxi.trips")


# Create a new Databricks Connect session. If this fails,
# check that you have configured Databricks Connect correctly.
# See https://docs.databricks.com/dev-tools/databricks-connect.html.
def get_spark() -> SparkSession:
try:
from databricks.connect import DatabricksSession

return DatabricksSession.builder.getOrCreate()
except ImportError:
return SparkSession.builder.getOrCreate()


def main():
get_taxis(get_spark()).show(5)
find_all_taxis().show(5)


if __name__ == "__main__":
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@
"source": [
"from my_default_python import main\n",
"\n",
"main.get_taxis(spark).show(10)"
"main.find_all_taxis().show(10)"
]
}
],
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
from my_default_python.main import get_taxis, get_spark
from my_default_python import main


def test_main():
taxis = get_taxis(get_spark())
def test_find_all_taxis():
taxis = main.find_all_taxis()
assert taxis.count() > 5
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,37 @@

The 'my_default_python' project was generated by using the default-python template.

For documentation on the Databricks asset bundles format use for this project,
and for CI/CD configuration, see https://docs.databricks.com/aws/en/dev-tools/bundles.

## Getting started

0. Install UV: https://docs.astral.sh/uv/getting-started/installation/
Choose how you want to work on this project:

(a) Directly in your Databricks workspace, see
https://docs.databricks.com/dev-tools/bundles/workspace.

(b) Locally with an IDE like Cursor, VS Code, or PyCharm, see
https://docs.databricks.com/vscode-ext and https://www.databricks.com/blog/announcing-pycharm-integration-databricks.

(c) With command line tools, see https://docs.databricks.com/dev-tools/cli/databricks-cli.html
Dependencies for this project should be installed using UV:

* Make sure you have the UV package manager installed.
It's an alternative to tools like pip: https://docs.astral.sh/uv/getting-started/installation/.
* Run `uv sync --dev` to install the project's dependencies.

1. Install the Databricks CLI from https://docs.databricks.com/dev-tools/cli/databricks-cli.html
# Using this project using the CLI

2. Authenticate to your Databricks workspace, if you have not done so already:
The Databricks workspace and IDE extensions provide a graphical interface for working
with this project. It's also possible to interact with it directly using the CLI:

1. Authenticate to your Databricks workspace, if you have not done so already:
```
$ databricks configure
```

3. To deploy a development copy of this project, type:
2. To deploy a development copy of this project, type:
```
$ databricks bundle deploy --target dev
```
Expand All @@ -23,9 +42,9 @@ The 'my_default_python' project was generated by using the default-python templa
This deploys everything that's defined for this project.
For example, the default template would deploy a job called
`[dev yourname] my_default_python_job` to your workspace.
You can find that job by opening your workpace and clicking on **Workflows**.
You can find that job by opening your workpace and clicking on **Jobs & Pipelines**.

4. Similarly, to deploy a production copy, type:
3. Similarly, to deploy a production copy, type:
```
$ databricks bundle deploy --target prod
```
Expand All @@ -35,17 +54,12 @@ The 'my_default_python' project was generated by using the default-python templa
is paused when deploying in development mode (see
https://docs.databricks.com/dev-tools/bundles/deployment-modes.html).

5. To run a job or pipeline, use the "run" command:
4. To run a job or pipeline, use the "run" command:
```
$ databricks bundle run
```
6. Optionally, install the Databricks extension for Visual Studio code for local development from
https://docs.databricks.com/dev-tools/vscode-ext.html. It can configure your
virtual environment and setup Databricks Connect for running unit tests locally.
When not using these tools, consult your development environment's documentation
and/or the documentation for Databricks Connect for manually setting up your environment
(https://docs.databricks.com/en/dev-tools/databricks-connect/python/index.html).

7. For documentation on the Databricks asset bundles format used
for this project, and for CI/CD configuration, see
https://docs.databricks.com/dev-tools/bundles/index.html.

5. Finally, to run tests locally, use `pytest`:
```
$ uv run pytest
```
Loading
Loading