A helper SDK for Databricks that provides Unity Catalog SQL UDTF registration utilities, Secret Manager integration, and Databricks-specific tooling for scalar UDTFs.
Latest Release: Version 0.2.1 adds SQL-native time series UDTF support with predicate pushdown hints and SQL query analyzer for extracting pushdown hints from SQL queries.
Note: This package provides Databricks-specific utilities for Unity Catalog UDTF registration and Secret Manager integration.
cognite-databricks is a Databricks-specific helper SDK that extends pygen-spark with Unity Catalog SQL registration, Secret Manager integration, and Databricks-specific utilities. It focuses on serverless-compatible scalar UDTF execution for SQL Warehouses.
Package Purpose:
- Databricks-Specific Features: Unity Catalog SQL registration, Secret Manager integration, and Databricks-specific utilities
- Uses pygen-spark for Code Generation: All UDTF code generation (both Data Model and Time Series UDTFs) is done by
pygen-sparkusing template-based generation - Generic Components: Generic utilities (
TypeConverter,CDFConnectionConfig,to_udtf_function_name) are provided bypygen-sparkand re-exported fromcognite.databricksfor backward compatibility - Notebook-Friendly API: Aligned with
cognite.pygen's notebook workflow
It provides high-level APIs for:
- UDTF Registration: Register persistent UDTFs in Unity Catalog via SQL
- Secret Manager Integration: Manage OAuth2 credentials securely
- SQL Usage: Use UDTFs directly in SQL after registration
- Notebook-Friendly API: Aligned with
cognite.pygen's notebook workflow
- Unity Catalog SQL Registration: Serverless-compatible UDTFs registered via
CREATE FUNCTIONstatements - One-Line Registration: Generate and register UDTFs in a single call
- Secret Manager Integration: Automatic credential management from TOML files
- Scalar-Only Execution: Compatible with SQL Warehouses and serverless execution
- Type Safety: Full type hints and IDE support
- Generic Components: Uses template-generated UDTFs and generic utilities (
TypeConverter,CDFConnectionConfig,to_udtf_function_name) fromcognite-pygen-sparkfor generic Spark compatibility. These components are re-exported fromcognite.databricksfor backward compatibility, but the source iscognite.pygen_spark.
pip install cognite-databricksfrom cognite.client.data_classes.data_modeling.ids import DataModelId
from cognite.databricks import generate_udtf_notebook
from cognite.pygen import load_cognite_client_from_toml
# Load client from TOML file (same pattern as pygen)
client = load_cognite_client_from_toml("config.toml")
# Generate UDTFs for a Data Model
data_model_id = DataModelId(space="sp_pygen_power", external_id="WindTurbine", version="1")
generator = generate_udtf_notebook(
data_model_id,
client,
)
# Register UDTFs in Unity Catalog (SQL registration)
udtf_result = generator.register_udtfs(
secret_scope="cdf_sp_pygen_power_windturbine",
if_exists="replace",
)
print(f"Registered {udtf_result.total_count} UDTF(s)")from cognite.client.data_classes.data_modeling.ids import DataModelId
from cognite.databricks import UDTFGenerator, SecretManagerHelper
from cognite.pygen import load_cognite_client_from_toml
# Load client from TOML file
client = load_cognite_client_from_toml("config.toml")
# Create generator
generator = UDTFGenerator(
cognite_client=client,
catalog="main",
schema="cdf_models",
)
# Set up Secret Manager (one-time setup per data model)
data_model_id = DataModelId(space="sp_pygen_power", external_id="WindTurbine", version="1")
secret_scope = f"cdf_{data_model_id.space}_{data_model_id.external_id.lower()}"
generator.secret_helper.set_cdf_credentials(
scope_name=secret_scope,
project="my-project", # from config.toml
cdf_cluster="api.cognitedata.com", # from config.toml
client_id="...", # from config.toml
client_secret="...", # from config.toml
tenant_id="...", # from config.toml
)
# Register UDTFs for catalog-based use (scalar-only)
registered = generator.register_session_scoped_udtfs()Session-scoped registration is the primary mode for scalar-only UDTFs. Register functions in the current Spark session before running SQL queries.
from cognite.databricks import generate_udtf_notebook
from cognite.pygen import load_cognite_client_from_toml
from cognite.client.data_classes.data_modeling.ids import DataModelId
# Load client and generate UDTFs
client = load_cognite_client_from_toml("config.toml")
data_model_id = DataModelId(space="sailboat", external_id="sailboat", version="v1")
generator = generate_udtf_notebook(
data_model_id,
client,
output_dir="/Workspace/Users/user@example.com/udtf",
)
# Install dependencies (run in separate cell first)
# %pip install cognite-sdk
# (Restart kernel after installation)
# Register all UDTFs for catalog-based use (includes time series UDTFs automatically)
registered = generator.register_session_scoped_udtfs()
# Returns: {"SmallBoat": "small_boat_udtf", "LargeBoat": "large_boat_udtf",
# "time_series_datapoints": "time_series_datapoints_udtf", ...}
# Use in SQL (always use SECRET() for credentials)
# SELECT * FROM small_boat_udtf(
# client_id => SECRET('cdf_sailboat_sailboat', 'client_id'),
# client_secret => SECRET('cdf_sailboat_sailboat', 'client_secret'),
# tenant_id => SECRET('cdf_sailboat_sailboat', 'tenant_id'),
# cdf_cluster => SECRET('cdf_sailboat_sailboat', 'cdf_cluster'),
# project => SECRET('cdf_sailboat_sailboat', 'project'),
# name => 'MyBoat',
# description => NULL
# ) LIMIT 10;from cognite.databricks import generate_udtf_notebook, register_udtf_from_file
# Generate UDTFs
generator = generate_udtf_notebook(data_model_id, client, ...)
# Register a single UDTF from generated file
register_udtf_from_file(
"/Workspace/Users/user@example.com/udtf/sailboat_sailboat_v1/SmallBoat_udtf.py",
function_name="small_boat_udtf"
)Session-scoped registration is the supported mode for scalar-only UDTFs. Functions are temporary and must be registered at the start of each notebook/job before running SQL queries.
Note: This document uses PyPI package names for references:
-
PyPI:
cognite-pygen(repository:pygen; import:cognite.pygen) -
PyPI:
cognite-pygen-spark(repository:pygen-spark; import:cognite.pygen_spark) -
Python 3.9+
-
cognite-pygen-spark(PyPI package name; import:cognite.pygen_spark) -
cognite-sdk-python(dependency) -
databricks-sdk(dependency)
cognite-databricks/
├── cognite/
│ └── databricks/
│ ├── __init__.py # Exports generate_udtf_notebook, UDTFGenerator, etc.
│ ├── udtf_registry.py # UDTF registration helpers
│ ├── secret_manager.py # Secret Manager helpers
│ ├── view_generator.py # View generation and registration
│ ├── generator.py # generate_udtf_notebook helper function
│ └── utils.py # Utility functions
├── pyproject.toml
└── README.md
High-level function for notebook workflows, aligned with pygen.generate_sdk_notebook:
from cognite.databricks import generate_udtf_notebook
generator = generate_udtf_notebook(
data_model_id,
client,
catalog="main",
schema="cdf_models",
)Main class for orchestrating UDTF generation and registration:
from cognite.databricks import UDTFGenerator
generator = UDTFGenerator(
workspace_client=workspace_client,
cognite_client=client,
catalog="main",
schema="cdf_models",
)Key Methods:
register_session_scoped_udtfs(): Register UDTFs for catalog-based use (scalar-only)register_udtf_from_file(): Register a single generated UDTF file in the current session
Standalone function for registering a single UDTF from a generated Python file for catalog-based use:
from cognite.databricks import register_udtf_from_file
register_udtf_from_file(
"/path/to/SmallBoat_udtf.py",
function_name="small_boat_udtf"
)Standalone function for registering a single UDTF from a generated Python file for catalog-based use. Useful when you only need to register one UDTF or want more control over the registration process.
from cognite.databricks import register_udtf_from_file
register_udtf_from_file(
"/path/to/SmallBoat_udtf.py",
function_name="small_boat_udtf"
)Helper for managing OAuth2 credentials in Databricks Secret Manager:
from cognite.databricks import SecretManagerHelper
secret_helper = SecretManagerHelper(workspace_client)
secret_helper.set_cdf_credentials(
scope_name="cdf_sp_pygen_power_windturbine",
project="my-project",
cdf_cluster="api.cognitedata.com",
client_id="...",
client_secret="...",
tenant_id="...",
)git clone <repository-url>
cd cognite-databricks
pip install -e ".[dev]"pytest tests/- cognite-pygen-spark (PyPI:
cognite-pygen-spark): Generic Spark UDTF code generation library that works with any Spark cluster. Provides template-based UDTF generation, type conversion utilities (TypeConverter), connection configuration (CDFConnectionConfig), and utility functions.cognite-databricksusespygen-sparkfor all code generation. - cognite-pygen (PyPI:
cognite-pygen): Base code generation library for CDF Data Models - cognite-sdk-python: Python SDK for CDF APIs
Generic components (TypeConverter, CDFConnectionConfig, to_udtf_function_name) are provided by pygen-spark and re-exported from cognite-databricks for backward compatibility:
# Preferred: Import directly from pygen-spark (source)
from cognite.pygen_spark import TypeConverter, CDFConnectionConfig, to_udtf_function_name
# Backward compatible: Still works (re-exported from pygen-spark)
from cognite.databricks import TypeConverter, CDFConnectionConfig, to_udtf_function_nameNote: These components are generic Spark utilities and work with any Spark cluster, not just Databricks. They were moved from cognite-databricks to pygen-spark to make them available for standalone Spark clusters.
For detailed documentation, see:
- Documentation Index: Complete guide for catalog-based scalar-only UDTF registration
- Unity Catalog UDTF Registration: Session-scoped workflow and SQL usage
- Technical Plan - CDF Databricks Integration (UDTF-Based): Architecture and design details
[License information]
[Contributing guidelines]