Skip to content

Support instance_pool_id in pipeline config cluster configuration #83

@Mmodarre

Description

@Mmodarre

Summary

Add instance_pool_id and driver_instance_pool_id support to the pipeline config cluster configuration, allowing users to use Databricks instance pools instead of specifying node_type_id directly.

Motivation

When using Databricks instance pools, the pool defines the node type — so node_type_id should be optional. Instance pools provide faster cluster startup times and cost savings through pre-allocated instances. The Databricks SDK's PipelineCluster already supports both fields as optional strings.

Changes

  • Jinja2 template (pipeline_resource.yml.j2): Make node_type_id conditional, add instance_pool_id and driver_instance_pool_id conditional blocks
  • Init template (pipeline_config_env.yaml.tmpl): Add documentation and examples for instance pool configuration
  • No loader changes needed: LHP's config loader already passes unknown cluster fields through (forward-compatible design)

Usage

# Option A: Specify node types directly (existing behavior)
clusters:
  - label: default
    node_type_id: Standard_D16ds_v5

# Option B: Use instance pools (new)
clusters:
  - label: default
    instance_pool_id: <your-pool-id>
    driver_instance_pool_id: <your-driver-pool-id>   # Optional
    autoscale:
      min_workers: 1
      max_workers: 5

# Supports LHP token substitution
clusters:
  - label: default
    instance_pool_id: "{worker_pool_id}"
    driver_instance_pool_id: "{driver_pool_id}"

Tests

  • 4 unit tests: pool rendering, mixed mode, driver pool, token substitution
  • 2 E2E tests: full generation with instance pool, mixed pool + node_type across pipelines

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions