GitHub template repository and Cookiecutter template for scaffolding new MLOps projects on the Databricks hub-and-spoke platform on Azure.
A fully structured Databricks MLOps project including:
- DLT ingestion pipeline — bronze/silver layers with Auto Loader
- Feature engineering — Databricks Feature Engineering API, Unity Catalog feature tables
- Model training — classification, time series (Prophet), and clustering archetypes
- Model evaluation — champion/challenger pattern using MLflow model aliases
- Batch inference — DLT pipeline writing Gold tables
- Model serving — optional real-time endpoint deployment via Databricks SDK
- Monitoring — Databricks Lakehouse Monitoring for data and model drift
- End-to-end orchestration — single orchestration job wiring all steps together
- CI/CD — GitHub Actions and Azure DevOps pipelines
- EU AI Act docs — model card, bias log, and risk classification templates
Before setting up a project, collect the following from the hub platform team:
| Item | Description |
|---|---|
| Workspace URLs (dev/staging/prod) | .azuredatabricks.net URLs |
| Unity Catalog catalog name | e.g. lob_a |
| Cluster policy ID | from Terraform output |
| Instance pool ID (dev) | from Terraform output |
| CI/CD identity | Azure AD service principal client ID |
| Secret store | Azure Key Vault kv-dsgrd-hub |
| Node type | e.g. Standard_DS3_v2 |
pip install cookiecutter
cookiecutter gh:your-org/dsgrd-databricks-lob-templateThe post-generation hook automatically places the correct CI/CD workflow files based on your cicd_platform choice. No manual cleanup needed.
- Click Use this template on GitHub to create your repo
- Replace all
{{cookiecutter.*}}placeholders with your values - Follow the CI/CD setup below
- Set the following GitHub repository variables:
AZURE_CLIENT_IDAZURE_TENANT_IDAZURE_SUBSCRIPTION_ID
Databricks tokens are fetched at runtime from Azure Key Vault (kv-dsgrd-hub) using the OIDC identity — no Databricks secrets needed in GitHub.
- Delete
.github/folder - In Azure DevOps, create a pipeline pointing to
.azure/pipelines/bundle-deploy.yml - Create a variable group named
databricks-lob-variablescontaining:AZURE_SERVICE_CONNECTION— name of your Azure service connection in Azure DevOps
| Variable | Description | Example |
|---|---|---|
project_slug |
Project name (lowercase, underscores) | lob_a_churn_prediction |
lob_name |
LOB identifier | lob-a |
lob_display_name |
LOB display name | LOB A |
catalog_name |
Unity Catalog catalog — get from hub team | lob_a |
schema_name |
Schema name | default |
workspace_url_dev |
Dev workspace URL | https://adb-xxx.azuredatabricks.net |
workspace_url_staging |
Staging workspace URL | |
workspace_url_prod |
Prod workspace URL | |
spark_version |
Databricks runtime | 15.4.x-scala2.12 |
node_type_id |
VM node type — get from hub team | Standard_DS3_v2 |
cluster_policy_id |
Cluster policy ID — get from hub team | |
instance_pool_id |
Dev instance pool ID — get from hub team | |
engineer_group |
Entra ID group or user email for your team | lob-a-engineers or user@client.com |
engineer_principal_type |
group or user |
group |
model_name |
MLflow registered model name | lob_a_churn_model |
mlflow_experiment_path |
MLflow experiment path | /Shared/lob_a.../experiments |
cicd_platform |
CI/CD platform | github_actions or azure_devops |
| Event | Action |
|---|---|
PR to main |
Validate bundle |
Push to main |
Validate + deploy to dev |
Push to release/* |
Validate → deploy staging → prod (prod requires environment approval gate) |
{{project_slug}}/
├── src/{{project_slug}}/
│ ├── features/
│ │ └── ingestion_pipeline.py # DLT bronze/silver ingestion
│ ├── training/
│ │ ├── classification/train.py
│ │ ├── time_series/train.py
│ │ └── clustering/train.py
│ ├── evaluation/
│ │ ├── classification/evaluate.py
│ │ ├── time_series/evaluate.py
│ │ └── clustering/evaluate.py
│ ├── inference/
│ │ └── batch_inference.py # DLT Gold table
│ ├── serving/
│ │ └── deploy_serving.py
│ └── monitoring/
│ └── setup_monitoring.py
├── resources/ # Databricks Asset Bundle resource YAMLs
├── databricks.yml # Bundle config (targets: dev, staging, prod)
├── .github/workflows/
│ └── bundle-deploy.yml # Azure OIDC + Key Vault
└── .azure/pipelines/
└── bundle-deploy.yml # Azure DevOps
- dsgrd-databricks-hub — Hub infrastructure repo (Terraform + shared bundles)