Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
127 changes: 127 additions & 0 deletions contrib/azure_cicd_quickstart/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
# Terraform
.terraform/
.terraform.lock.hcl
*.lock.hcl
*.tfstate
*.tfstate.*
*.tfvars
terraform.tfstate.d/
crash.log
override.tf
override.tf.json
*_override.tf
*_override.tf.json

# Local .terraform directories
**/.terraform/*

# .tfstate files
*.tfstate
*.tfstate.*

# Crash log files
crash.log

# Exclude all .tfvars files, which are likely to contain sentitive data
*.tfvars

# Ignore override files as they are usually used to override resources locally
override.tf
override.tf.json
*_override.tf
*_override.tf.json

# Include override files you do wish to add to version control using negated pattern
# !example_override.tf

# Include .tflock.hcl files you do wish to add to version control using negated pattern
!.terraform.lock.hcl

# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# Virtual environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# IDE
.vscode/
.idea/
*.swp
*.swo

# OS
.DS_Store
.DS_Store?
._*
.Spotlight-V100
.Trashes
ehthumbs.db
Thumbs.db

# Logs
*.log

# Runtime data
pids
*.pid
*.seed
*.pid.lock

# Coverage directory used by tools like istanbul
coverage/
*.lcov

# nyc test coverage
.nyc_output

# Dependency directories
node_modules/

# Optional npm cache directory
.npm

# Optional eslint cache
.eslintcache

# Output of 'npm pack'
*.tgz

# Yarn Integrity file
.yarn-integrity

# dotenv environment variables file
.env
.env.test
.env.local
.env.production

# Temporary folders
tmp/
temp/
128 changes: 128 additions & 0 deletions contrib/azure_cicd_quickstart/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
# Azure DevOps CI/CD for Databricks Asset Bundles (DABs)

A complete solution for deploying Databricks Asset Bundles using Azure DevOps pipelines with managed identity authentication and multi-environment support.
The azure_cicd_quickstart project deploys Azure resources to facilitate a safe ci/cd process with Databricks Asset Bundles. To learn more about when to use terraform, apis, and Databricks Asset Bundles read https://medium.com/@alexott_en/terraform-vs-databricks-asset-bundles-6256aa70e387
## Quick Start

This solution automatically creates everything you need for DAB CI/CD in Azure DevOps:

- **Azure DevOps project and pipeline**
- **Multi-environment variable groups** (dev/test/prod)
- **Managed identities** with federated credentials
- **Service connections** for each environment
- **Automated pipeline configuration** - no manual setup required

### Prerequisites

- Azure CLI (logged in)
- Terraform >= 1.0
- Azure DevOps organization access
- Owner/Contributor permissions on target Azure subscriptions

### Setup

1. **Configure your environment**:
```bash
cd terraform/
cp terraform.tfvars.template terraform.tfvars
# Edit terraform.tfvars with your values
```

2. **Deploy infrastructure**:
```bash
terraform init
terraform apply
```

3. **Add Manged Identity to Databricks Workspace**:
- View the Manged Identities in the terraform outputs
- Add the Managed Identities into their respective workspaces

4. **Start using the pipeline**:
- Pipeline is automatically created and configured
- Add your DAB folders anywhere in the repository
- Create PRs to trigger validation
- Merge to main/test/dev to deploy

## What Gets Created

| Component | Description |
|-----------|-------------|
| **Azure DevOps Project** | Single project containing pipeline and repository |
| **Dynamic Pipeline** | Automatically detects changed DABs and deploys only what's needed |
| **Variable Groups** | Environment-specific configuration (dev/test/prod) |
| **Managed Identities** | Secure, password-less authentication for each environment |
| **Service Connections** | Azure subscription connections using workload identity |


The pipeline automatically:
1. **Detects changed DAB folders** using git diff
2. **Selects environment** based on branch (dev/test/main)
3. **Authenticates** using managed identity
4. **Deploys only changed bundles** for efficiency
5. **Provides detailed logging** and error handling

## Repository Structure

After deployment, your repository will look like:

```
your-repo/
├── azure-pipelines.yml # Auto-generated pipeline
├── my-data-pipeline/ # Your DAB folders
│ ├── databricks.yml # (anywhere in repo)
│ └── src/
├── another-bundle/
│ ├── databricks.yml
│ └── notebooks/
└── terraform/ # Infrastructure code
└── README.md # Detailed setup guide
```

## Branch-Based Deployments

| Branch | Environment | Variable Group | Databricks Workspace |
|--------|------------|----------------|----------------------|
| `dev` | Development | `{pipeline_name}-Dev-Variables` | Dev workspace |
| `test` | Testing | `{pipeline_name}-Test-Variables` | Test workspace |
| `main` | Production | `{pipeline_name}-Prod-Variables` | Prod workspace |

## Detailed Documentation

For complete setup instructions, troubleshooting, and advanced configuration:

**[See Terraform README](terraform/README.md)** for detailed deployment guide

## Troubleshooting

### Common Issues

- **"No matching federated identity found"** → Check organization GUID in terraform.tfvars
- **"Resource group not found"** → Ensure resource group exists before running terraform
- **"Pipeline not triggering"** → Verify DAB folders have `databricks.yml` files

### Getting Help

1. Check the [detailed troubleshooting guide](terraform/README.md#troubleshooting)
2. Verify all prerequisite permissions are in place
3. Review Azure DevOps pipeline logs for specific error messages

## Architecture

This solution follows enterprise DevOps patterns:

- **Single DevOps Project**: Centralized pipeline management
- **Environment Isolation**: Separate subscriptions/workspaces per environment
- **Managed Identity**: Secure, password-less authentication
- **Conditional Deployment**: Only changed DABs are deployed
- **Branch Protection**: Production deployments only from main branch

## Next Steps

After successful deployment:

1. **Test the pipeline** - Create a test DAB and commit changes
2. **Set up branch policies** - Protect main branch, require PR reviews
3. **Add your DABs** - Place Databricks Asset Bundles anywhere in the repo
4. **Monitor deployments** - Use Azure DevOps pipeline history and logs
5. **Scale up** - Add more environments or customize the pipeline
Loading