-
Notifications
You must be signed in to change notification settings - Fork 101
Add a User-Defined Table Function example #138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
69e3edd to
12a7010
Compare
- Add k-means clustering UDTF example for Unity Catalog - Focus documentation on UDTF pattern and SQL accessibility - Include Python implementation and SQL usage examples - Add CI/CD integration instructions
ee6874e to
986dd79
Compare
| import csv | ||
| import os | ||
| except ImportError: | ||
| raise ImportError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not great for debugging to lose original stack trace. Why not add the message but keep the stacktrace?
try:
...
except:
print(..., file.sys.stderr)
raise
| register_udtf_job: | ||
| name: register_udtf_job | ||
| schedule: | ||
| quartz_cron_expression: '0 0 8 * * ?' # daily at 8am |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
q - why do you need to register the same udtf daily?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
related q - how to unregister? is it just removing this job and there is some clean up process?
| if not self.rows: | ||
| return | ||
|
|
||
| import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment why imports are here and not at the top level? Is it so that registration works without these libraries?
| def register(catalog: str, schema: str, name: str = "k_means"): | ||
| """Register k_means UDTF in Unity Catalog""" | ||
| spark = SparkSession.builder.getOrCreate() | ||
| source = inspect.getsource(SklearnKMeans) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this function gets access to dependencies? Does it rely on sklearn being pre-provisioned?
This adds an example for defining a User-Defined Table Function (UDTF) in Unity Catalog.
Highlighted files:
src/kmeans_udtf.py: registers the UDTFtests/test_kmeans_udtf.py: unit test for registration and usage (run withuv run pytest)resources/register_udtf_job.yml: job that registers the UDTF