Welcome to the repository for the Udemy course CI/CD with Databricks Asset Bundles
. This repository serves as a supplementary resource, specifically providing project files used in the course.
This repository contains a wiki with project code and yaml configuration snippets for specific lectures.
The wiki is organised into sections and pages that align with the structure of the course. Each page corresponds to a particular lecture, making it easy to follow along and reference the material as you progress through the course.
dab_project
is the name given to the project and was generated by using the default-python template.
Before you run or deploy this project, you'll have to follow along with the Udemy course to set up your environment (Databricks Workspaces, Service Principals, GitHub Repository etc.).
You will also need to update the databricks.yml
configuration file with your Workspace URLs and Service Principal details.
You'll also want to set up local Python environments for Databricks Connect and local PySpark development. Follow the instructions for your platform below.
-
Create and activate the Databricks Connect environment (using Python 3.11)
# at the project root python3.11 -m venv .venv_dbc source .venv_dbc/bin/activate
-
Install Databricks Connect dependencies
pip install -r requirements-dbc.txt
-
Verify installation
pip list deactivate
-
Create and activate the local PySpark environment
python3.11 -m venv .venv_pyspark source .venv_pyspark/bin/activate
-
Install PySpark dependencies
pip install -r requirements-pyspark.txt
-
Verify installation
pip list deactivate
-
Create and activate the Databricks Connect environment (using Python 3.11)
# at the project root py -3.11 -m venv .venv_dbc .\.venv_dbc\Scripts\activate
-
Install Databricks Connect dependencies
pip install -r requirements-dbc.txt
-
Verify installation
pip list deactivate
-
Create and activate the local PySpark environment
python -m venv .venv_pyspark .\.venv_pyspark\Scripts\Activate.ps1
-
Install PySpark dependencies
pip install -r requirements-pyspark.txt
-
Verify installation
pip list deactivate
-
Install the Databricks CLI
curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh
or alternatively on a MacOS if you need admin override
sudo curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sudo sh
-
Authenticate to your Databricks workspace, if you have not done so already:
databricks configure
-
To deploy a development copy of this project, type:
databricks bundle deploy --target dev
(Note that "dev" is the default target, so the
--target
parameter is optional here.)This deploys everything that's defined for this project. For example, the default template would deploy a job called
[dev yourname] dab_project_job
to your workspace. You can find that job by opening your workspace and clicking on Workflows. -
Similarly, to deploy a production copy, type:
databricks bundle deploy --target prod
Note that the default job from the template has a schedule that runs every day (defined in resources/dab_project.job.yml). The schedule is paused when deploying in development mode (see https://docs.databricks.com/dev-tools/bundles/deployment-modes.html).
-
To run a job or pipeline, use the "run" command:
databricks bundle run
-
Optionally, install developer tools such as the Databricks extension for Visual Studio Code from https://docs.databricks.com/dev-tools/vscode-ext.html.
-
For documentation on the Databricks asset bundles format used for this project, and for CI/CD configuration, see https://docs.databricks.com/dev-tools/bundles/index.html.