v0.30.0
- Fixed codec error in md (#2234). In this release, we have addressed a codec error in the
mdfile that caused issues on Windows machines due to the presence of curly quotes. This has been resolved by replacing curly quotes with straight quotes. The affected code pertains to the.setJobGrouppattern in theSparkContextwherespark.addTag()is used to attach a tag, andgetTags()andinterruptTag(tag)are used to act upon the presence or absence of a tag. These APIs are specific to Spark Connect (Shared Compute Mode) and will not work inAssignedaccess mode. Additionally, the release includes updates to the README.md file, providing solutions for various issues related to UCX installation and configuration. These changes aim to improve the user experience and ensure a smooth installation process for software engineers adopting the project. This release also enhances compatibility and reliability of the code for users across various operating systems. The changes were co-authored by Cor and address issue #2234. Please note that this release does not provide medical advice or treatment and should not be used as a substitute for professional medical advice. It also does not process Protected Health Information (PHI) as defined in the Health Insurance Portability and Accountability Act of 1996, unless certain conditions are met. All names used in the tool have been synthetically generated and do not map back to any actual persons or locations. - Group manager optimisation: during group enumeration only request the attributes that are needed (#2240). In this optimization update to the
groups.pyfile, the_list_workspace_groupsfunction has been modified to reduce the number of attributes requested during group enumeration to the minimum set necessary. This improvement is achieved by removing themembersattribute from the list of requested attributes when it is requested during enumeration. For each group returned byself._ws.groups.list, the function now checks if the group is out of scope and, if not, retrieves the group with all its attributes using the_get_groupfunction. Additionally, the newscan_attributesvariable limits the attributes requested during the initial enumeration to "id", "displayName", and "meta". This optimization reduces the risk of timeouts caused by large attributes and improves the performance of group enumeration, particularly in cases where members are requested during enumeration due to API issues. - Group migration: additional logging (#2239). In this release, we have implemented logging improvements for group migration within the group manager. These enhancements include the addition of new informational and debug logs aimed at helping to understand potential issues during group migration. The affected functionality includes the existing workflow
group-migration. New logging statements have been added to numerous methods, such asrename_groups,_rename_group,_wait_for_rename,_wait_for_renamed_groups,reflect_account_groups_on_workspace,delete_original_workspace_groups, andvalidate_group_membership, as well as data retrieval methods including_workspace_groups_in_workspace,_account_groups_in_workspace, and_account_groups_in_account. These changes will provide increased visibility into the group migration process, including starting to rename/reflect groups, checking for renamed groups, and validating group membership. - Group migration: improve robustness while deleting workspace groups (#2247). This pull request introduces changes to the group manager aimed at enhancing the reliability of deleting workspace groups, addressing an issue where deletion was being skipped for groups that had recently been renamed due to eventual consistency concerns. The changes involve double-checking the deletion of groups by ensuring they can no longer be directly retrieved from the API and are no longer present in the list of groups during enumeration. Additionally, logging has been improved, and the renaming of groups will be updated in a subsequent pull request. The
remove-workspace-local-backup-groupsworkflow and related tests have been modified, and new classes indicating incomplete deletion or rename operations have been implemented. These changes improve the robustness of deleting workspace groups, reducing the likelihood of issues arising post-deletion and enhancing overall system consistency. - Improve error messages in case of connection errors (#2210). In this release, we've made significant improvements to error messages for connection errors in the
databricks labs ucx (un)installcommand, addressing part of issue #1323. The changes include the addition of a new import,RequestsConnectionErrorfrom therequestspackage, and updates to the error handling in therunmethod to provide clearer and more informative messages during connection problems. A newexceptblock has been added to handleTimeoutErrorexceptions caused byRequestsConnectionError, logging a warning message with information on troubleshooting network connectivity issues. Theconfiguremethod has also been updated with a docstring noting that connection errors are not handled within it. To ensure the improvements work as expected, we've added new manual and integration tests, including a test for a simulated workspace with no internet connection, and a new function to configure such a workspace. The test checks for the presence of a specific warning message in the log output. The changes also include new type annotations and imports. The target audience for this update includes software engineers adopting the project, who will benefit from clearer error messages and guidance when troubleshooting connection problems. - Increase timeout for sequence of slow preliminary jobs (#2222). In this enhancement, the timeout duration for a series of slow preliminary jobs has been increased from 4 minutes to 6 minutes, addressing issue #2219. The modification is implemented in the
test_running_real_remove_backup_groups_jobfunction in thetests/integration/install/test_installation.pyfile, where theget_groupfunction'sretrieddecorator timeout is updated from 4 minutes to 6 minutes. This change improves the system's handling of slow preliminary jobs by allowing more time for the API to delete a group and minimizing errors resulting from insufficient deletion time. The overall functionality and tests of the system remain unaffected. - Init
RuntimeContextfrom debug notebook to simplify interactive debugging flows (#2253). In this release, we have implemented a change to simplify interactive debugging flows in UCX workflows. We have introduced a new feature that initializes theRuntimeContextobject from a debug notebook. TheRuntimeContextis a subclass ofGlobalContextthat manages all object dependencies. Previously, all UCX workflows used aRuntimeContextinstance for any object lookup, which could be complex during debugging. This change pre-initializes theRuntimeContextobject correctly, making it easier to perform interactive debugging. Additionally, we have replaced the use ofInstallation.load_localandWorkspaceClientwith the newly initializedRuntimeContextobject. This reduces the complexity of object lookup and simplifies the code for debugging purposes. Overall, this change will make it easier to debug UCX workflows by pre-initializing theRuntimeContextobject with the necessary configurations. - Lint child dependencies recursively (#2226). In this release, we've implemented significant changes to our linting process for enhanced context awareness, particularly in the context of parent-child file relationships. The
DependencyGraphclass in thegraph.pymodule has been updated with new methods, includingparent,root_dependencies,root_paths, androot_relative_names, and an improved_relative_namesmethod. These changes allow for more accurate linting of child dependencies. Thelintfunction in thefiles.pymodule has also been modified to accept new parameters and utilize a recursive linting approach for child dependencies. Thedatabricks labs ucx lint-local-codecommand has been updated to include apathsparameter and lint child dependencies recursively, improving the linting process by considering parent-child relationships and resulting in better contextual code analysis. The release contains integration tests to ensure the functionality of these changes, addressing issues #2155 and #2156. - Removed deprecated
install.shscript (#2217). In this release, we have removed the deprecatedinstall.shscript from the codebase, which was previously used to install and set up the environment for the project. This script would check for the presence of Python binaries, identify the latest version, create a virtual environment, and install project dependencies. Going forward, developers will need to utilize an alternative method for installing and setting up the project environment, as the use of this script is now obsolete. We recommend consulting the updated documentation for guidance on the new installation process. - Tentatively fix failure when running assessment without a hive_metastore (#2252). In this update, we have enhanced the error handling of the
LocalCheckoutContextclass in theworkspace_cli.pyfile. Specifically, we have addressed the issue where a fatal failure occurred when running an assessment without a Hive metastore (#2252) by implementing a more graceful error handling mechanism. Now, when the metastore fails to load during the initialization of aLinterContextobject, a warning message is logged instead, and theMigrationIndexis initialized with an empty list. This change is linked to the resolution of issue #2221. Additionally, we have imported theMigrationIndexclass from thehive_metastore.migration_statusmodule and added a logger to the module. However, please note that functional tests for this specific modification have not been conducted. - Total Storage Credentials count widget for Assessment Dashboard (#2201). In this commit, a new widget has been added to the Assessment Dashboard that displays the current total number of storage credentials created in the workspace, up to a limit of 200. This change includes a new SQL query to retrieve the count of storage credentials from the
inventory.external_locationstable and modifies the display of the widget with customized settings. Additionally, a new warning mechanism has been implemented to prevent migration from exceeding the UC storage credentials limit of 200. A new method,get_roles_to_migrate, has been added toaccess.pyto retrieve the roles that need to be migrated. If the number of roles exceeds 200, aRuntimeWarningis raised. User documentation and manual testing have been updated to reflect these changes, but no unit or integration tests have been added yet. This feature is part of the implementation of issue #1600 and is co-authored by Serge Smertin. - Updated dashboard install using latest
lsqlrelease (#2246). In this release, the install function for the UCX dashboard has been updated in thedatabricks/labs/ucx/install.pyfile to use the latestlsqlrelease. Thedatabricks labs instal ucxcommand has been modified to accommodate the updatedlsqlversion and now includes new methods for upgrading dashboards from Redash to Lakeview, as well as creating and deleting dashboards in Lakeview, which also feature functionality to publish dashboards. The changes have been manually tested and verified on a staging environment. The query formatting in the dashboard has been improved, and the--widthparameter is no longer necessary in certain instances. This update streamlines the dashboard installation process, enhances its functionality, and ensures its compatibility with the latestlsqlrelease. - Updated sqlglot requirement from <25.7,>=25.5.0 to >=25.5.0,<25.8 (#2248). In this update, we have adjusted the version requirements for the SQL transpiler library, sqlglot, in our pyproject.toml file. The requirement has been updated from ">=25.5.0, <25.7" to ">=25.5.0, <25.8", allowing us to utilize the latest features and bug fixes available in sqlglot version 25.7.0 while still maintaining our previous version constraint. The changelog from sqlglot's repository has been included in this commit, detailing the new features and improvements introduced in version 25.7.0. A list of commits made since the previous version is also provided. The diff of this commit shows that the change only affects the version constraint for sqlglot and does not impact any other parts of the codebase. This update ensures that we are using the most recent stable version of sqlglot while maintaining backward compatibility.
Dependency updates:
- Updated sqlglot requirement from <25.7,>=25.5.0 to >=25.5.0,<25.8 (#2248).
Contributors: @ericvergnaud, @JCZuurmond, @asnare, @nfx, @tarikcurto, @dependabot[bot], @pritishpai