v0.14.0
- Added
upgraded_from_workspace_idproperty to migrated tables to indicated the source workspace (#987). In this release, updates have been made to the_migrate_external_table,_migrate_dbfs_root_table, and_migrate_viewmethods in thetable_migrate.pyfile to include a new parameterupgraded_from_wsin the SQL commands used to alter tables, views, or managed tables. This parameter is used to store the source workspace ID in the migrated tables, indicating the migration origin. A new utility methodsql_alter_fromhas been added to theTableclass intables.pyto generate the SQL command with the new parameter. Additionally, a new class-level attributeUPGRADED_FROM_WS_PARAMhas been added to theTableclass intables.pyto indicate the source workspace. A new propertyupgraded_from_workspace_idhas been added to migrated tables to store the source workspace ID. These changes resolve issue #899 and are tested through manual testing, unit tests, and integration tests. No new CLI commands, workflows, or tables have been added or modified, and there are no changes to user documentation. - Added a command to create account level groups if they do not exist (#763). This commit introduces a new feature that enables the creation of account-level groups if they do not already exist in the account. A new command,
create-account-groups, has been added to thedatabricks labs ucxtool, which crawls all workspaces in the account and creates account-level groups if a corresponding workspace-local group is not found. The feature supports various scenarios, including creating account-level groups that exist in some workspaces but not in others, and creating multiple account-level groups with the same name but different members. Several new methods have been added to theaccount.pyfile to support the new feature, and thetest_account.pyfile has been updated with new tests to ensure the correct behavior of thecreate_account_level_groupsmethod. Additionally, thecli.pyfile has been updated to include the newcreate-account-groupscommand. With these changes, users can easily manage account-level groups and ensure that they are consistent across all workspaces in the account, improving the overall user experience. - Added assessment for the incompatible
RunSubmitAPI usages (#849). In this release, the assessment functionality for incompatibleRunSubmitAPI usages has been significantly enhanced through various changes. The 'clusters.py' file has seen improvements in clarity and consistency with the renaming of private methodscheck_spark_confto_check_spark_confandcheck_cluster_failuresto_check_cluster_failures. The_assess_clustersmethod has been updated to call the renamed_check_cluster_failuresmethod for thorough checks of cluster configurations, resulting in better assessment functionality. A newSubmitRunsCrawlerclass has been added to thedatabricks.labs.ucx.assessment.jobsmodule, implementingCrawlerBase,JobsMixin, andCheckClusterMixinclasses. This class crawls and assesses job runs based on their submitted runs, ensuring compatibility and identifying failure issues. Additionally, a new configuration attribute,num_days_submit_runs_history, has been introduced in theWorkspaceConfigclass of theconfig.pymodule, controlling the number of days for which submission history ofRunSubmitAPI calls is retained. Lastly, various new JSON files have been added for unit testing, assessing theRunSubmitAPI usages related to different scenarios like dbt task runs, Git source-based job runs, JAR file runs, and more. These tests will aid in identifying and addressing potential compatibility issues with theRunSubmitAPI. - Added group members difference to the output of
validate-groups-membershipcli command (#995). Thevalidate-groups-membershipcommand has been updated to include a comparison of group memberships at both the account and workspace levels. This enhancement is implemented through thevalidate_group_membershipfunction, which has been updated to calculate the difference in members between the two levels and display it in a newgroup_members_differencecolumn. This allows for a more detailed analysis of group memberships and easily identifies any discrepancies between the account and workspace levels. The corresponding unit test file, "test_groups.py," has been updated to include a new test case that verifies the calculation of thegroup_members_differencevalue. The functionality of the other commands remains unchanged. The newgroup_members_differencevalue is calculated as the difference in the number of members in the workspace group and the account group, with a positive value indicating more members in the workspace group and a negative value indicating more members in the account group. The table template in the labs.yml file has also been updated to include the new column for the group membership difference. - Added handling for empty
directory_idif managed identity encountered during the crawling of StoragePermissionMapping (#986). This PR adds atypefield to theStoragePermissionMappingandPrincipaldataclasses to differentiate between service principals and managed identities, allowingNonefor thedirectory_idfield if the principal is not a service principal. During the migration to UC storage credentials, managed identities are currently ignored. These changes improve handling of managed identities during the crawling ofStoragePermissionMapping, prevent errors when creating storage credentials with managed identities, and address issue #339. The changes are tested through unit tests, manual testing, and integration tests, and only affect theStoragePermissionMappingclass and related methods, without introducing new commands, workflows, or tables. - Added migration for Azure Service Principals with secrets stored in Databricks Secret to UC Storage Credentials (#874). In this release, we have made significant updates to migrate Azure Service Principals with their secrets stored in Databricks Secret to UC Storage Credentials, enhancing security and management of storage access. The changes include: Addition of a new
migrate_credentialscommand in thelabs.ymlfile to migrate credentials for storage access to UC storage credential. Modification ofsecrets.pyto handle the case where a secret has been removed from the backend and to log warning messages for secrets with invalid Base64 bytes. Introduction of theStorageCredentialManagerandServicePrincipalMigrationclasses incredentials.pyto manage Azure Service Principals and their associated client secrets, and to migrate them to UC Storage Credentials. Addition of a newdirectory_idattribute in thePrincipalclass and its associated dataclass inresources.pyto store the directory ID for creating UC storage credentials using a service principal. Creation of a new pytest fixture,make_storage_credential_spn, infixtures.pyto simplify writing tests requiring Databricks Storage Credentials with Azure Service Principal auth. Addition of a new test file for the Azure integration of the project, including new classes, methods, and test cases for testing the migration of Azure Service Principals to UC Storage Credentials. These improvements will ensure better security and management of storage access using Azure Service Principals, while providing more efficient and robust testing capabilities. - Added permission migration support for feature tables and the root permissions for models and feature tables (#997). This commit introduces support for migration of permissions related to feature tables and sets root permissions for models and feature tables. New functions such as
feature_store_listing,feature_tables_root_page,models_root_page, andtokens_and_passwordshave been added to facilitate population of a workspace access page with necessary permissions information. Thefactoryfunction inmanager.pyhas been updated to include new listings for models' root page, feature tables' root page, and the feature store for enhanced management and access control of models and feature tables. New classes and methods have been implemented to handle permissions for these resources, utilizingGenericPermissionsSupport,AccessControlRequest, andMigratedGroupclasses. Additionally, new test methods have been included to verify feature tables listing functionality and root page listing functionality for feature tables and registered models. The test manager method has been updated to includefeature-tablesin the list of items to be checked for permissions, ensuring comprehensive testing of permission functionality related to these new feature tables. - Added support for serving endpoints (#990). In this release, we have made significant enhancements to support serving endpoints in our open-source library. The
fixtures.pyfile in thedatabricks.labs.ucx.mixinsmodule has been updated with new classes and functions to create and manage serving endpoints, accompanied by integration tests to verify their functionality. We have added a new listing for serving endpoints in the assessment's permissions crawling, using thews.serving_endpoints.listfunction and theserving-endpointscategory. A new integration test, "test_endpoints," has been added to verify that assessments now crawl permissions for serving endpoints. This test demonstrates the ability to migrate permissions from one group to another. The test suite has been updated to ensure the proper functioning of the new feature and improve the assessment of permissions for serving endpoints, ensuring compatibility with the updatedtest_manager.pyfile. - Expanded end-user documentation with detailed descriptions for workflows and commands (#999). The Databricks Labs UCX project has been updated with several new features to assist in upgrading to Unity Catalog, including an assessment workflow that generates a detailed compatibility report for workspace entities, a group migration workflow for upgrading all Databricks workspace assets, and utility commands for managing cross-workspace installations. The Assessment Report now includes a more detailed summary of the assessment findings, table counts, database summaries, and external locations. Additional improvements include expanded workspace group migration to handle potential conflicts with locally scoped group names, enhanced documentation for external Hive Metastore integration, a new debugging notebook, and detailed descriptions of table upgrade considerations, data access permissions, external storage, and table crawler.
- Fixed
config.ymlupgrade from very old versions (#984). In this release, we've introduced enhancements to the configuration upgrading process forconfig.ymlin our open-source library. We've replaced the previousv1_migrateclass method with a new implementation that specifically handles migration from version 1. The new method retrieves thegroupsfield, extracts theselectedvalue, and assigns it to theinclude_group_nameskey in the configuration. Thebackup_group_prefixvalue from thegroupsfield is assigned to therenamed_group_prefixkey, and thegroupsfield is removed, with the version number updated to 2. These changes simplify the code and improve readability, enabling users to upgrade smoothly from version 1 of the configuration. Furthermore, we've added new unit tests to thetest_config.pyfile to ensure backward compatibility. Two new tests,test_v1_migrate_zeroconfandtest_v1_migrate_some_conf, have been added, utilizing theMockInstallationclass and loading the configuration usingWorkspaceConfig. These tests enhance the robustness and reliability of the migration process forconfig.yml. - Renamed columns in assessment SQL queries to use actual names, not aliases (#983). In this update, we have resolved an issue where aliases used for column references in SQL queries caused errors in certain setups by renaming them to use actual names. Specifically, for assessment SQL queries, we have modified the definition of the
is_deltacolumn to use the actualtable_formatname instead of the aliasformat. This change improves compatibility and enhances the reliability of query execution. As a software engineer, you will appreciate that this modification ensures consistent interpretation of column references across various setups, thereby avoiding potential errors caused by aliases. This change does not introduce any new methods, but instead modifies existing functionality to use actual column names, ensuring a more reliable and consistent SQL query for the05_0_all_tablesassessment. - Updated groups permissions validation to use Table ACL cluster (#979). In this update, the
validate_groups_permissionstask has been modified to utilize the Table ACL cluster, as indicated by the inclusion ofjob_cluster="tacl". This task is responsible for ensuring that all crawled permissions are accurately applied to the destination groups by calling thepermission_manager.apply_group_permissionsmethod during the migration state. This modification enhances the validation of group permissions by performing it on the Table ACL cluster, potentially improving performance or functionality. If you are implementing this project, it is crucial to comprehend the consequences of this change on your permissions validation process and adjust your workflows appropriately.
Contributors: @nfx, @william-conti, @mwojtyczka, @FastLee, @qziyuan, @nkvuong, @larsgeorge-db