Skip to content

Latest commit

 

History

History
126 lines (92 loc) · 5.19 KB

File metadata and controls

126 lines (92 loc) · 5.19 KB

Cognite Databricks Integration Documentation

Overview

cognite-databricks provides two approaches for registering and using User-Defined Table Functions (UDTFs) to query CDF data from Databricks:

  1. Session-Scoped Registration: Temporary registration for development and testing
  2. Catalog-Based Registration: Permanent registration in Unity Catalog for production

Choosing the Right Approach

Use Session-Scoped Registration When:

  • Developing and Testing: Quickly test UDTFs before committing to Unity Catalog
  • Prototyping: Experiment with different configurations and queries
  • Development: Quick setup for testing and development
  • Temporary Analysis: Running ad-hoc queries without permanent registration
  • Learning: Getting familiar with UDTFs and CDF data access patterns

Key Characteristics:

  • UDTFs are registered within a single Spark session
  • No Unity Catalog access required
  • Credentials passed directly in SQL queries (or via Secret Manager)
  • Automatically cleaned up when session ends
  • Faster setup, ideal for development

Use Catalog-Based Registration When:

  • Production Deployments: Permanent registration with Unity Catalog governance
  • Data Discovery: Views are indexed and searchable in the Databricks UI
  • Access Control: Need fine-grained permissions (GRANT/REVOKE)
  • Enterprise Security: Credentials stored securely in Databricks Secret Manager
  • Team Collaboration: Shared, discoverable data assets across teams
  • Production: Using Unity Catalog for permanent, discoverable data assets

Key Characteristics:

  • UDTFs and Views registered in Unity Catalog
  • Permanent, discoverable, and governable
  • Credentials managed via Secret Manager (no credentials in SQL)
  • Views provide simplified query interface
  • Requires Unity Catalog access and Secret Manager setup

Quick Start

Session-Scoped (Development)

from cognite.databricks import generate_udtf_notebook
from cognite.pygen import load_cognite_client_from_toml

client = load_cognite_client_from_toml("config.toml")
generator = generate_udtf_notebook(data_model_id, client)
generator.register_session_scoped_udtfs()

See Session-Scoped Documentation for complete guide.

Catalog-Based (Production)

from cognite.databricks import generate_udtf_notebook, SecretManagerHelper
from databricks.sdk import WorkspaceClient

workspace_client = WorkspaceClient()
generator = generate_udtf_notebook(data_model_id, client, workspace_client=workspace_client)
result = generator.register_udtfs_and_views(secret_scope="cdf_scope")

See Catalog-Based Documentation for complete guide.

Documentation Structure

Session-Scoped UDTF Registration

Catalog-Based UDTF Registration

Examples

All examples are available in the examples/ directory:

  • Session-Scoped Examples: examples/session_scoped/
  • Catalog-Based Examples: examples/catalog_based/

Package Architecture

cognite-databricks extends pygen-spark with Databricks-specific features:

  • Code Generation: Uses pygen-spark for template-based UDTF generation (both Data Model and Time Series UDTFs)
  • Generic Components: Generic utilities (TypeConverter, CDFConnectionConfig, to_udtf_function_name) are provided by pygen-spark and re-exported from cognite.databricks for backward compatibility
  • Databricks-Specific: Unity Catalog registration, Secret Manager integration, and Databricks-specific utilities

Import Paths for Generic Components:

# Preferred: Import directly from pygen-spark (source)
from cognite.pygen_spark import TypeConverter, CDFConnectionConfig, to_udtf_function_name

# Backward compatible: Still works (re-exported from pygen-spark)
from cognite.databricks import TypeConverter, CDFConnectionConfig, to_udtf_function_name

Related Resources