Skip to content

Conversation

@lennartkats-db
Copy link
Contributor

@lennartkats-db lennartkats-db commented Dec 19, 2025

This adds an example for defining a User-Defined Table Function (UDTF) in Unity Catalog.

Highlighted files:

@lennartkats-db lennartkats-db changed the title [DRAFT] Add UDTF example Add UDTF example Dec 19, 2025
@lennartkats-db lennartkats-db changed the title Add UDTF example Add a User-Defined Table Function example Dec 19, 2025
- Add k-means clustering UDTF example for Unity Catalog
- Focus documentation on UDTF pattern and SQL accessibility
- Include Python implementation and SQL usage examples
- Add CI/CD integration instructions
import csv
import os
except ImportError:
raise ImportError(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not great for debugging to lose original stack trace. Why not add the message but keep the stacktrace?

try:
  ...
except:
  print(..., file.sys.stderr)
  raise

register_udtf_job:
name: register_udtf_job
schedule:
quartz_cron_expression: '0 0 8 * * ?' # daily at 8am
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q - why do you need to register the same udtf daily?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

related q - how to unregister? is it just removing this job and there is some clean up process?

if not self.rows:
return

import numpy as np
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment why imports are here and not at the top level? Is it so that registration works without these libraries?

def register(catalog: str, schema: str, name: str = "k_means"):
"""Register k_means UDTF in Unity Catalog"""
spark = SparkSession.builder.getOrCreate()
source = inspect.getsource(SklearnKMeans)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this function gets access to dependencies? Does it rely on sklearn being pre-provisioned?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants