dab_project

Welcome to the repository for the Udemy course CI/CD with Databricks Asset Bundles. This repository serves as a supplementary resource, specifically providing project files used in the course.

Wiki

This repository contains a wiki with project code and yaml configuration snippets for specific lectures.

The wiki is organised into sections and pages that align with the structure of the course. Each page corresponds to a particular lecture, making it easy to follow along and reference the material as you progress through the course.

dab_project is the name given to the project and was generated by using the default-python template.

Getting started

Before you run or deploy this project, you'll have to follow along with the Udemy course to set up your environment (Databricks Workspaces, Service Principals, GitHub Repository etc.).

You will also need to update the databricks.yml configuration file with your Workspace URLs and Service Principal details.

You'll also want to set up local Python environments for Databricks Connect and local PySpark development. Follow the instructions for your platform below.

Virtual Environment Setup

macOS / Linux

Create and activate the Databricks Connect environment (using Python 3.11)

# at the project root
python3.11 -m venv .venv_dbc
source .venv_dbc/bin/activate

Install Databricks Connect dependencies
```
pip install -r requirements-dbc.txt
```
Verify installation
```
pip list
deactivate
```

Create and activate the local PySpark environment

python3.11 -m venv .venv_pyspark
source .venv_pyspark/bin/activate

Install PySpark dependencies
```
pip install -r requirements-pyspark.txt
```
Verify installation
```
pip list
deactivate
```

Windows

Create and activate the Databricks Connect environment (using Python 3.11)

# at the project root
py -3.11 -m venv .venv_dbc
.\.venv_dbc\Scripts\activate

Install Databricks Connect dependencies
```
pip install -r requirements-dbc.txt
```
Verify installation
```
pip list
deactivate
```

Create and activate the local PySpark environment

python -m venv .venv_pyspark
.\.venv_pyspark\Scripts\Activate.ps1

Install PySpark dependencies
```
pip install -r requirements-pyspark.txt
```
Verify installation
```
pip list
deactivate
```

Databricks CLI, Set-Up and Bundle Commands

Install the Databricks CLI

curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh

or alternatively on a MacOS if you need admin override

sudo curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sudo sh

Authenticate to your Databricks workspace, if you have not done so already:
```
databricks configure
```
To deploy a development copy of this project, type:
```
databricks bundle deploy --target dev
```
(Note that "dev" is the default target, so the --target parameter is optional here.)

This deploys everything that's defined for this project. For example, the default template would deploy a job called [dev yourname] dab_project_job to your workspace. You can find that job by opening your workspace and clicking on Workflows.
Similarly, to deploy a production copy, type:
```
databricks bundle deploy --target prod
```
Note that the default job from the template has a schedule that runs every day (defined in resources/dab_project.job.yml). The schedule is paused when deploying in development mode (see https://docs.databricks.com/dev-tools/bundles/deployment-modes.html).
To run a job or pipeline, use the "run" command:
```
databricks bundle run
```
Optionally, install developer tools such as the Databricks extension for Visual Studio Code from https://docs.databricks.com/dev-tools/vscode-ext.html.
For documentation on the Databricks asset bundles format used for this project, and for CI/CD configuration, see https://docs.databricks.com/dev-tools/bundles/index.html.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/workflows		.github/workflows
citibike_etl		citibike_etl
docs		docs
resources		resources
src		src
tests		tests
.coveragerc		.coveragerc
.gitignore		.gitignore
README.md		README.md
databricks.yml		databricks.yml
requirements-dbc.txt		requirements-dbc.txt
requirements-pyspark.txt		requirements-pyspark.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

dab_project

Wiki

Getting started

Virtual Environment Setup

macOS / Linux

Windows

Databricks CLI, Set-Up and Bundle Commands

About

Uh oh!

Releases

Packages

Uh oh!

Languages

pathfinder-analytics-uk/dab_project

Folders and files

Latest commit

History

Repository files navigation

dab_project

Wiki

Getting started

Virtual Environment Setup

macOS / Linux

Windows

Databricks CLI, Set-Up and Bundle Commands

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages