Skip to content

Commit 05feabc

Browse files
committed
Include instantiation of the pydabs template
1 parent a7441c4 commit 05feabc

File tree

20 files changed

+680
-0
lines changed

20 files changed

+680
-0
lines changed

pydabs/.gitignore

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
.databricks/
2+
build/
3+
dist/
4+
__pycache__/
5+
*.egg-info
6+
.venv/
7+
scratch/**
8+
!scratch/README.md
9+
**/explorations/**
10+
**/!explorations/README.md

pydabs/.vscode/__builtins__.pyi

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Typings for Pylance in Visual Studio Code
2+
# see https://github.com/microsoft/pyright/blob/main/docs/builtins.md
3+
from databricks.sdk.runtime import *

pydabs/.vscode/extensions.json

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
{
2+
"recommendations": [
3+
"databricks.databricks",
4+
"redhat.vscode-yaml",
5+
"ms-python.black-formatter"
6+
]
7+
}

pydabs/.vscode/settings.json

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
{
2+
"jupyter.interactiveWindow.cellMarker.codeRegex": "^# COMMAND ----------|^# Databricks notebook source|^(#\\s*%%|#\\s*\\<codecell\\>|#\\s*In\\[\\d*?\\]|#\\s*In\\[ \\])",
3+
"jupyter.interactiveWindow.cellMarker.default": "# COMMAND ----------",
4+
"python.testing.pytestArgs": [
5+
"."
6+
],
7+
"files.exclude": {
8+
"**/*.egg-info": true,
9+
"**/__pycache__": true,
10+
".pytest_cache": true,
11+
"dist": true,
12+
},
13+
"files.associations": {
14+
"**/.gitkeep": "markdown"
15+
},
16+
17+
// Pylance settings (VS Code)
18+
// Set typeCheckingMode to "basic" to enable type checking!
19+
"python.analysis.typeCheckingMode": "off",
20+
"python.analysis.extraPaths": ["src", "lib", "resources"],
21+
"python.analysis.diagnosticMode": "workspace",
22+
"python.analysis.stubPath": ".vscode",
23+
24+
// Pyright settings (Cursor)
25+
// Set typeCheckingMode to "basic" to enable type checking!
26+
"cursorpyright.analysis.typeCheckingMode": "off",
27+
"cursorpyright.analysis.extraPaths": ["src", "lib", "resources"],
28+
"cursorpyright.analysis.diagnosticMode": "workspace",
29+
"cursorpyright.analysis.stubPath": ".vscode",
30+
31+
// General Python settings
32+
"python.defaultInterpreterPath": "./.venv/bin/python",
33+
"python.testing.unittestEnabled": false,
34+
"python.testing.pytestEnabled": true,
35+
"[python]": {
36+
"editor.defaultFormatter": "ms-python.black-formatter",
37+
"editor.formatOnSave": true,
38+
},
39+
}

pydabs/README.md

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
# pydabs
2+
3+
The 'pydabs' project was generated by using the default template.
4+
5+
* `src/`: Python source code for this project.
6+
* `src/pydabs/`: Shared Python code that can be used by jobs and pipelines.
7+
* `resources/`: Resource configurations (jobs, pipelines, etc.)
8+
* `tests/`: Unit tests for the shared Python code.
9+
* `fixtures/`: Fixtures for data sets (primarily used for testing).
10+
11+
12+
## Getting started
13+
14+
Choose how you want to work on this project:
15+
16+
(a) Directly in your Databricks workspace, see
17+
https://docs.databricks.com/dev-tools/bundles/workspace.
18+
19+
(b) Locally with an IDE like Cursor or VS Code, see
20+
https://docs.databricks.com/dev-tools/vscode-ext.html.
21+
22+
(c) With command line tools, see https://docs.databricks.com/dev-tools/cli/databricks-cli.html
23+
24+
If you're developing with an IDE, dependencies for this project should be installed using uv:
25+
26+
* Make sure you have the UV package manager installed.
27+
It's an alternative to tools like pip: https://docs.astral.sh/uv/getting-started/installation/.
28+
* Run `uv sync --dev` to install the project's dependencies.
29+
30+
31+
# Using this project using the CLI
32+
33+
The Databricks workspace and IDE extensions provide a graphical interface for working
34+
with this project. It's also possible to interact with it directly using the CLI:
35+
36+
1. Authenticate to your Databricks workspace, if you have not done so already:
37+
```
38+
$ databricks configure
39+
```
40+
41+
2. To deploy a development copy of this project, type:
42+
```
43+
$ databricks bundle deploy --target dev
44+
```
45+
(Note that "dev" is the default target, so the `--target` parameter
46+
is optional here.)
47+
48+
This deploys everything that's defined for this project.
49+
For example, the default template would deploy a pipeline called
50+
`[dev yourname] pydabs_etl` to your workspace.
51+
You can find that resource by opening your workpace and clicking on **Jobs & Pipelines**.
52+
53+
3. Similarly, to deploy a production copy, type:
54+
```
55+
$ databricks bundle deploy --target prod
56+
```
57+
Note the default template has a includes a job that runs the pipeline every day
58+
(defined in resources/sample_job.job.yml). The schedule
59+
is paused when deploying in development mode (see
60+
https://docs.databricks.com/dev-tools/bundles/deployment-modes.html).
61+
62+
4. To run a job or pipeline, use the "run" command:
63+
```
64+
$ databricks bundle run
65+
```
66+
67+
5. Finally, to run tests locally, use `pytest`:
68+
```
69+
$ uv run pytest
70+
```

pydabs/databricks.yml

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
# This is a Databricks asset bundle definition for pydabs.
2+
# See https://docs.databricks.com/dev-tools/bundles/index.html for documentation.
3+
bundle:
4+
name: pydabs
5+
uuid: 4062028b-2184-4acd-9c62-f2ec572f7843
6+
7+
python:
8+
venv_path: .venv
9+
# Functions called to load resources defined in Python. See resources/__init__.py
10+
resources:
11+
- "resources:load_resources"
12+
13+
include:
14+
- resources/*.yml
15+
- resources/*/*.yml
16+
17+
artifacts:
18+
python_artifact:
19+
type: whl
20+
build: uv build --wheel
21+
22+
# Variable declarations. These variables are assigned in the dev/prod targets below.
23+
variables:
24+
catalog:
25+
description: The catalog to use
26+
schema:
27+
description: The schema to use
28+
29+
targets:
30+
dev:
31+
# The default target uses 'mode: development' to create a development copy.
32+
# - Deployed resources get prefixed with '[dev my_user_name]'
33+
# - Any job schedules and triggers are paused by default.
34+
# See also https://docs.databricks.com/dev-tools/bundles/deployment-modes.html.
35+
mode: development
36+
default: true
37+
workspace:
38+
#host: https://company.databricks.com
39+
variables:
40+
catalog: main
41+
schema: ${workspace.current_user.short_name}
42+
prod:
43+
mode: production
44+
workspace:
45+
#host: https://company.databricks.com
46+
# We explicitly deploy to /Workspace/Users/[email protected] to make sure we only have a single copy.
47+
root_path: /Workspace/Users/[email protected]/.bundle/${bundle.name}/${bundle.target}
48+
variables:
49+
catalog: main
50+
schema: prod
51+
permissions:
52+
- user_name: [email protected]
53+
level: CAN_MANAGE

pydabs/fixtures/.gitkeep

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# Test fixtures directory
2+
3+
Add JSON or CSV files here. In tests, use them with `load_fixture()`:
4+
5+
```
6+
def test_using_fixture(load_fixture):
7+
data = load_fixture("my_data.json")
8+
assert len(data) >= 1
9+
```

pydabs/pyproject.toml

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
[project]
2+
name = "pydabs"
3+
version = "0.0.1"
4+
authors = [{ name = "[email protected]" }]
5+
requires-python = ">=3.10,<=3.13"
6+
dependencies = [
7+
# Any dependencies for jobs and pipelines in this project can be added here
8+
# See also https://docs.databricks.com/dev-tools/bundles/library-dependencies
9+
#
10+
# LIMITATION: for pipelines, dependencies are cached during development;
11+
# add dependencies to the 'environment' section of your pipeline.yml file instead
12+
]
13+
14+
[dependency-groups]
15+
dev = [
16+
"pytest",
17+
"databricks-dlt",
18+
"databricks-connect>=15.4,<15.5",
19+
"databricks-bundles==0.277.0",
20+
]
21+
22+
[project.scripts]
23+
main = "pydabs.main:main"
24+
25+
[build-system]
26+
requires = ["hatchling"]
27+
build-backend = "hatchling.build"
28+
29+
[tool.black]
30+
line-length = 125

pydabs/resources/__init__.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
from databricks.bundles.core import (
2+
Bundle,
3+
Resources,
4+
load_resources_from_current_package_module,
5+
)
6+
7+
8+
def load_resources(bundle: Bundle) -> Resources:
9+
"""
10+
'load_resources' function is referenced in databricks.yml and is responsible for loading
11+
bundle resources defined in Python code. This function is called by Databricks CLI during
12+
bundle deployment. After deployment, this function is not used.
13+
"""
14+
15+
# the default implementation loads all Python files in 'resources' directory
16+
return load_resources_from_current_package_module()
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
from databricks.bundles.pipelines import Pipeline
2+
3+
"""
4+
The main pipeline for pydabs
5+
"""
6+
7+
pydabs_etl = Pipeline.from_dict(
8+
{
9+
"name": "pydabs_etl",
10+
"catalog": "${var.catalog}",
11+
"schema": "${var.schema}",
12+
"serverless": True,
13+
"root_path": "src/pydabs_etl",
14+
"libraries": [
15+
{
16+
"glob": {
17+
"include": "src/pydabs_etl/transformations/**",
18+
},
19+
},
20+
],
21+
"environment": {
22+
"dependencies": [
23+
# We include every dependency defined by pyproject.toml by defining an editable environment
24+
# that points to the folder where pyproject.toml is deployed.
25+
"--editable ${workspace.file_path}",
26+
],
27+
},
28+
}
29+
)

0 commit comments

Comments
 (0)