Skip to content

Commit 818f440

Browse files
authored
Assessment: crawl UDFs as a task in parallel to tables instead of implicitly during grants (#2642)
## Changes This PR updates the way UDFs are crawled/scanned during the `assessment` workflow: - Prior to this PR the UDFs are crawled/scanned implicitly by the `GrantsCrawler`: it requests a snapshot from the `UDFSCrawler` but that hasn't executed prior to this point in the workflow. - With this PR, the UDFs are crawled/scanned as their own task in parallel with tables before grants crawling commences. ### Linked issues Progresses #2574, where grants and UDFs need to be refreshable but only once within a given workflow run. ### Functionality - modified existing workflow: `assessment` ### Tests - [ ] manually tested - [ ] added unit tests - [ ] added integration tests - [ ] verified on staging environment (screenshot attached)
1 parent 0a1882e commit 818f440

File tree

1 file changed

+8
-1
lines changed

1 file changed

+8
-1
lines changed

src/databricks/labs/ucx/assessment/workflows.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,11 +19,18 @@ def crawl_tables(self, ctx: RuntimeContext):
1919
stored is then used in the subsequent tasks and workflows to, for example, find all Hive Metastore tables that
2020
cannot easily be migrated to Unity Catalog."""
2121

22+
@job_task
23+
def crawl_udfs(self, ctx: RuntimeContext):
24+
"""Iterates over all UDFs in the Hive Metastore of the current workspace and persists their metadata in the
25+
table named `$inventory_database.udfs`. This inventory is currently used when scanning securable objects for
26+
issues with grants that cannot be migrated to Unit Catalog."""
27+
ctx.udfs_crawler.snapshot()
28+
2229
@job_task(job_cluster="tacl")
2330
def setup_tacl(self, ctx: RuntimeContext):
2431
"""(Optimization) Starts `tacl` job cluster in parallel to crawling tables."""
2532

26-
@job_task(depends_on=[crawl_tables, setup_tacl], job_cluster="tacl")
33+
@job_task(depends_on=[crawl_tables, crawl_udfs, setup_tacl], job_cluster="tacl")
2734
def crawl_grants(self, ctx: RuntimeContext):
2835
"""Scans all securable objects for permissions that have been assigned: this include database-level permissions,
2936
as well permissions directly configured on objects in the (already gathered) table and UDF inventories. The

0 commit comments

Comments
 (0)