cognite-databricks provides two approaches for registering and using User-Defined Table Functions (UDTFs) to query CDF data from Databricks:
- Session-Scoped Registration: Temporary registration for development and testing
- Catalog-Based Registration: Permanent registration in Unity Catalog for production
- ✅ Developing and Testing: Quickly test UDTFs before committing to Unity Catalog
- ✅ Prototyping: Experiment with different configurations and queries
- ✅ Development: Quick setup for testing and development
- ✅ Temporary Analysis: Running ad-hoc queries without permanent registration
- ✅ Learning: Getting familiar with UDTFs and CDF data access patterns
Key Characteristics:
- UDTFs are registered within a single Spark session
- No Unity Catalog access required
- Credentials passed directly in SQL queries (or via Secret Manager)
- Automatically cleaned up when session ends
- Faster setup, ideal for development
- ✅ Production Deployments: Permanent registration with Unity Catalog governance
- ✅ Data Discovery: Views are indexed and searchable in the Databricks UI
- ✅ Access Control: Need fine-grained permissions (GRANT/REVOKE)
- ✅ Enterprise Security: Credentials stored securely in Databricks Secret Manager
- ✅ Team Collaboration: Shared, discoverable data assets across teams
- ✅ Production: Using Unity Catalog for permanent, discoverable data assets
Key Characteristics:
- UDTFs and Views registered in Unity Catalog
- Permanent, discoverable, and governable
- Credentials managed via Secret Manager (no credentials in SQL)
- Views provide simplified query interface
- Requires Unity Catalog access and Secret Manager setup
from cognite.databricks import generate_udtf_notebook
from cognite.pygen import load_cognite_client_from_toml
client = load_cognite_client_from_toml("config.toml")
generator = generate_udtf_notebook(data_model_id, client)
generator.register_session_scoped_udtfs()See Session-Scoped Documentation for complete guide.
from cognite.databricks import generate_udtf_notebook, SecretManagerHelper
from databricks.sdk import WorkspaceClient
workspace_client = WorkspaceClient()
generator = generate_udtf_notebook(data_model_id, client, workspace_client=workspace_client)
result = generator.register_udtfs_and_views(secret_scope="cdf_scope")See Catalog-Based Documentation for complete guide.
- Quickstart
- Prerequisites
- Secret Manager
- Registration
- Views
- Querying
- Filtering
- Joining
- Time Series
- SQL-Native Time Series (Alpha)
- Governance
- Troubleshooting
All examples are available in the examples/ directory:
- Session-Scoped Examples:
examples/session_scoped/ - Catalog-Based Examples:
examples/catalog_based/
cognite-databricks extends pygen-spark with Databricks-specific features:
- Code Generation: Uses
pygen-sparkfor template-based UDTF generation (both Data Model and Time Series UDTFs) - Generic Components: Generic utilities (
TypeConverter,CDFConnectionConfig,to_udtf_function_name) are provided bypygen-sparkand re-exported fromcognite.databricksfor backward compatibility - Databricks-Specific: Unity Catalog registration, Secret Manager integration, and Databricks-specific utilities
Import Paths for Generic Components:
# Preferred: Import directly from pygen-spark (source)
from cognite.pygen_spark import TypeConverter, CDFConnectionConfig, to_udtf_function_name
# Backward compatible: Still works (re-exported from pygen-spark)
from cognite.databricks import TypeConverter, CDFConnectionConfig, to_udtf_function_name- README: Package overview and installation
- Technical Plan: Architecture and design details
- pygen-spark Documentation: Generic Spark UDTF code generation library (works with any Spark cluster)