Solution: Ensure you're running the code in a Databricks notebook with an active Spark session. If using a Python script, create a SparkSession:
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("UDTF Registration").getOrCreate()Solution: Install PySpark or ensure you're running in a Databricks environment where PySpark is available:
%pip install pysparkPossible Causes:
- Incorrect credentials: Verify that SECRET() values are correct
- No matching data: Check that filters match existing data in CDF
- Time series doesn't exist: Verify the time series external_id exists in CDF
Debug Steps:
# Test credentials
from cognite.pygen import load_cognite_client_from_toml
client = load_cognite_client_from_toml("config.toml")
# Test data model query
from cognite.client.data_classes.data_modeling.ids import DataModelId
data_model_id = DataModelId(space="sailboat", external_id="sailboat", version="v1")
views = client.data_modeling.views.list(data_model_id)
print(f"Found {len(views)} views")Solution: Restart the Python kernel after installing packages with %pip:
- Run
%pip install cognite-sdk cognite-databricks - When prompted, click "Restart" to restart the kernel
- Re-run your registration code
Possible Causes:
- Function name mismatch: Verify the registered function name matches what you're calling in SQL
- Parameter mismatch: Check that all required parameters are provided
- Type errors: Ensure parameter types match the UDTF's expected types
Debug Steps:
# Check registered functions
registered = generator.register_session_scoped_udtfs()
print("Registered functions:", registered)
# Verify function name in SQL matches
# If registered as "small_boat_udtf", use:
# SELECT * FROM small_boat_udtf(
# client_id => SECRET('cdf_sailboat_sailboat', 'client_id'),
# client_secret => SECRET('cdf_sailboat_sailboat', 'client_secret'),
# tenant_id => SECRET('cdf_sailboat_sailboat', 'tenant_id'),
# cdf_cluster => SECRET('cdf_sailboat_sailboat', 'cdf_cluster'),
# project => SECRET('cdf_sailboat_sailboat', 'project'),
# name => NULL,
# description => NULL
# ) LIMIT 10;Cause: UDTFs are table functions. The SQL syntax differs between notebook %sql and SQL Warehouse.
Solution: Use the table-function syntax for your environment:
Notebook %sql (cluster-backed):
SELECT *
FROM time_series_datapoints_detailed_udtf(
client_id => SECRET('cdf_sailboat_sailboat', 'client_id'),
client_secret => SECRET('cdf_sailboat_sailboat', 'client_secret'),
tenant_id => SECRET('cdf_sailboat_sailboat', 'tenant_id'),
cdf_cluster => SECRET('cdf_sailboat_sailboat', 'cdf_cluster'),
project => SECRET('cdf_sailboat_sailboat', 'project'),
instance_ids => 'space1:ts1,space1:ts2',
start => current_timestamp() - INTERVAL 52 WEEKS,
end => current_timestamp() - INTERVAL 51 WEEKS,
aggregates => 'average',
granularity => '2h'
) AS t;SQL Warehouse (Databricks SQL):
SELECT *
FROM TABLE(
time_series_datapoints_detailed_udtf(
client_id => SECRET('cdf_sailboat_sailboat', 'client_id'),
client_secret => SECRET('cdf_sailboat_sailboat', 'client_secret'),
tenant_id => SECRET('cdf_sailboat_sailboat', 'tenant_id'),
cdf_cluster => SECRET('cdf_sailboat_sailboat', 'cdf_cluster'),
project => SECRET('cdf_sailboat_sailboat', 'project'),
instance_ids => 'space1:ts1,space1:ts2',
start => current_timestamp() - INTERVAL 52 WEEKS,
end => current_timestamp() - INTERVAL 51 WEEKS,
aggregates => 'average',
granularity => '2h'
)
);Possible Causes:
- Time series doesn't exist: Verify the time series external_id exists
- No datapoints in range: Check that the time range (start/end) contains data
- Incorrect instance_id format: Ensure space and external_id are correct
Debug Steps:
# Test time series existence
from cognite.client.data_classes.data_modeling.ids import NodeId
instance_id = NodeId(space="sailboat", external_id="vessel.speed")
ts = client.time_series.retrieve(external_id=instance_id.external_id)
print(f"Time series exists: {ts is not None}")
# Test datapoints retrieval
datapoints = client.time_series.data.retrieve(
external_id=instance_id.external_id,
start="1d-ago",
end="now"
)
print(f"Found {len(datapoints)} datapoints")If you encounter issues not covered here:
- Check the logs: Look for error messages in the notebook output or Databricks logs
- Verify credentials: Ensure CDF credentials are correct and have proper permissions
- Test with simple queries: Start with basic queries before adding complex filters or joins
- Review the Technical Plan: See the Technical Plan document for detailed architecture and implementation details
After successfully using session-scoped UDTFs, consider:
- Unity Catalog Registration: Register UDTFs and Views in Unity Catalog for production use (see Catalog-Based UDTF Registration)
- View Creation: Create SQL Views that wrap UDTFs for easier querying
- Governance: Set up Unity Catalog permissions for production deployments
For more information, see:
- Catalog-Based UDTF Registration
- Technical Plan: CDF Databricks Integration (UDTF-Based)