-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Summary
Add instance_pool_id and driver_instance_pool_id support to the pipeline config cluster configuration, allowing users to use Databricks instance pools instead of specifying node_type_id directly.
Motivation
When using Databricks instance pools, the pool defines the node type — so node_type_id should be optional. Instance pools provide faster cluster startup times and cost savings through pre-allocated instances. The Databricks SDK's PipelineCluster already supports both fields as optional strings.
Changes
- Jinja2 template (
pipeline_resource.yml.j2): Makenode_type_idconditional, addinstance_pool_idanddriver_instance_pool_idconditional blocks - Init template (
pipeline_config_env.yaml.tmpl): Add documentation and examples for instance pool configuration - No loader changes needed: LHP's config loader already passes unknown cluster fields through (forward-compatible design)
Usage
# Option A: Specify node types directly (existing behavior)
clusters:
- label: default
node_type_id: Standard_D16ds_v5
# Option B: Use instance pools (new)
clusters:
- label: default
instance_pool_id: <your-pool-id>
driver_instance_pool_id: <your-driver-pool-id> # Optional
autoscale:
min_workers: 1
max_workers: 5
# Supports LHP token substitution
clusters:
- label: default
instance_pool_id: "{worker_pool_id}"
driver_instance_pool_id: "{driver_pool_id}"Tests
- 4 unit tests: pool rendering, mixed mode, driver pool, token substitution
- 2 E2E tests: full generation with instance pool, mixed pool + node_type across pipelines
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request