v0.17.0
- Added AWS IAM role support to
databricks labs ucx create-uber-principalcommand (#993). Thedatabricks labs ucx create-uber-principalcommand now supports AWS Identity and Access Management (IAM) roles for external table migration. This new feature introduces a CLI command to create anuber-IAMprofile, which checks for the UCX migration cluster policy and updates or adds the migration policy to provide access to the relevant table locations. If no IAM instance profile or role is specified in the cluster policy, a new one is created and the new migration policy is added. This change includes new methods and functions to handle AWS IAM roles, instance profiles, and related trust policies. Additionally, new unit and integration tests have been added and verified on the staging environment. The implementation also identifies all S3 buckets used by the Instance Profiles configured in the workspace. - Added Dashboard widget to show the list of cluster policies along with DBR version (#1013). In this code revision, the
assessmentmodule of the 'databricks/labs/ucx' package has been updated to include a newPoliciesCrawlerclass, which fetches, assesses, and snapshots cluster policies. This class extendsCrawlerBaseandCheckClusterMixinand introduces the '_crawl', '_assess_policies', '_try_fetch', andsnapshotmethods. ThePolicyInfodataclass has been added to hold policy information, with a structure similar to theClusterInfodataclass. TheClusterInfodataclass has been updated to includespark_versionandpolicy_idattributes. A new table for policies has been added, and cluster policies along with the DBR version are loaded into this table. Relevant user documentation, tests, and a Dashboard widget have been added to support this feature. Thecreatefunction in 'fixtures.py' has been updated to enable a Delta preview feature in Spark configurations, and a new SQL file has been included for querying cluster policies. Additionally, a newcrawl_cluster_policiesmethod has been added to scan and store cluster policies with matching configurations. - Added
migration_statustable to capture a snapshot of migrated tables (#1041). Amigration_statustable has been added to track the status of migrated tables in the database, enabling improved management and tracking of migrations. The newMigrationStatusclass, which is a dataclass that holds the source and destination schema, table, and updated timestamp, is added. TheTablesMigrateclass now has a new_migration_status_refresherattribute that is an instance of the newMigrationStatusRefresherclass. This class crawls themigration_statustable and returns a snapshot of the migration status, which is used to refresh the migration status and check if the table is upgraded. Additionally, the_init_seen_tablesmethod is updated to get the seen tables from the_migration_status_refresherinstead of fetching from the table properties. TheMigrationStatusRefresherclass fetches the migration status table and returns a snapshot of the migration status. This change also adds new test functions in the test file for the Hive metastore, which covers various scenarios such as migrating managed tables with and without caching, migrating external tables, and reverting migrated tables. - Added a check for existing inventory database to avoid losing existing, inject installation objects in tests and try fetching existing installation before setting global as default (#1043). In this release, we have added a new method,
_check_inventory_database_exists, to theWorkspaceInstallationclass, which checks if an inventory database with a given name already exists in the Workspace. This prevents accidental overwriting of existing data and improves the robustness of handling inventory databases. Thevalidate_and_runmethod has been updated to callapp.current_installation(workspace_client), allowing for a more flexible handling of installations. TheInstallationclass import has been updated to includeSerdeError, and the test suite has been updated to inject installation objects and check for existing installations before setting the global installation as default. A new argumentinventory_schema_suffixhas been added to thefactorymethod for customization of the inventory schema name. We have also added a new methodcheck_inventory_database_existsto theWorkspaceInstallerclass, which checks if an inventory database already exists for a given installation type and raises anAlreadyExistserror if it does. The behavior of thedownloadmethod in theWorkspaceClientclass has been mocked, and theget_statusmethod has been updated to returnNotFoundin certain tests. These changes aim to improve the robustness, flexibility, and safety of the installation process in the Workspace. - Added a check for external metastore in SQL warehouse configuration (#1046). In this release, we have added new functionality to the Unity Catalog (UCX) installation process to enable checking for and connecting to an external Hive metastore configuration. A new method,
_get_warehouse_config_with_external_hive_metastore, has been introduced to retrieve the workspace warehouse config and identify if it is set up for an external Hive metastore. If so, and the user confirms the prompt, UCX will be configured to connect to the external metastore. Additionally, new methods_extract_external_hive_metastore_sql_confandtest_cluster_policy_definition_<cloud_provider>_hms_warehouse()have been added to handle the external metastore configuration for Azure, AWS, and GCP, and to handle the case when the data_access_config is empty. These changes provide more flexibility and ease of use when installing UCX with external Hive metastore configurations. The new importsEndpointConfPair,GetWorkspaceWarehouseConfigResponsefrom thedatabricks.sdk.service.sqlpackage are used to handle the endpoint configuration of the SQL warehouse. - Added integration tests for AWS - create locations (#1026). In this release, we have added comprehensive integration tests for AWS resources and their management in the
tests/unit/assessment/test_aws.pyfile. TheAWSResourcesclass has been updated with new methods (AwsIamRole, add_uc_role, add_uc_role_policy, and validate_connection) and the regular expression for matching S3 resource ARN has been modified. Thecreate_external_locationsmethod now allows for creating external locations without validating them, and the_identify_missing_external_locationsfunction has been enhanced to match roles with a wildcard pattern. The new tests include validating the integration of AWS services with the system, testing the CLI's behavior when it is missing, and introducing new configuration scenarios with the addition of a Key Management Service (KMS) key during the creation of IAM roles and policies. These changes improve the robustness and reliability of AWS resource integration and handling in our system. - Bump Databricks SDK to v0.22.0 (#1059). In this release, we are bumping the Databricks SDK version to 0.22.0 and upgrading the
databricks-labs-lsqlpackage to ~0.2.2. The new dependencies for this release includedatabricks-sdk==0.22.0,databricks-labs-lsql~=0.2.2,databricks-labs-blueprint~=0.4.3, andPyYAML>=6.0.0,<7.0.0. In thefixtures.pyfile, we have addedPermissionLevel.CAN_QUERYto theCAN_VIEWandCAN_MANAGEpermissions in the_pathfunction, allowing users to query the endpoint. Additionally, we have updated thetest_endpointsfunction in thetest_generic.pyfile as part of the integration tests for workspace access. This change updates the permission level for creating a serving endpoint fromCAN_MANAGEtoCAN_QUERY, meaning that the assigned group can now only query the endpoint. We have also included thetest_feature_tablesfunction in the commit, which tests the behavior of feature tables in the Databricks workspace. This change only affects thetest_endpointsfunction and its assert statements, and does not impact the functionality of thetest_feature_tablesfunction. - Changed default UCX installation folder to
/Applications/ucxfrom/Users/<me>/.ucxto allow multiple users users utilising the same installation (#854). In this release, we've added a new advanced feature that allows users to force the installation of UCX over an existing installation using theUCX_FORCE_INSTALLenvironment variable. This variable can take two valuesglobaland 'user', providing more control and flexibility in installing UCX. The default UCX installation folder has been changed to /Applications/ucx from /Users//.ucx to enable multiple users to utilize the same installation. A table detailing the expected install location,install_folder, and mode for each combination of global and user values has been added to the README file. We've also added user prompts to confirm the installation if UCX is already installed and theUCX_FORCE_INSTALLvariable is set to 'user'. This feature is useful when users want to install UCX in a specific location or force the installation over an existing installation. However, it is recommended to use this feature with caution, as it can potentially break existing installations if not used correctly. Additionally, several changes to the implementation of the UCX installation process have been made, as well as new tests to ensure that the installation process works correctly in various scenarios. - Fix: Recover lost fix for
webbrowser.openmock (#1052). A fix has been implemented to address an issue related to the mock forwebbrowser.openin the teststest_repair_runandtest_get_existing_installation_global. This change prevents thewebbrowser.openfunction from being called during these tests, which helps improve test stability and consistency. No new methods have been added, and the existing functionality of these tests has only been modified to include thewebbrowser.openmock. This modification aims to enhance the reliability and predictability of these specific tests, ensuring accurate and consistent results. - Improved table migrations logic (#1050). This change introduces improvements to table migrations logic by refactoring unit tests to load table mappings from JSON instead of inline structs, adding an
escape_sql_identifierfunction where missing, and preparing for ACLs migration. Theuc_grant_sqlmethod ingrants.pyhas been updated to accept optionalobject_typeandobject_keyparameters, and the hive-to-UC mapping has been expanded to include mappings for views. Additionally, new JSON files for external source table configuration have been added, and new functions have been introduced for loading fixture data from JSON files and creating mockedWorkspaceClientandTableMappingobjects for testing. The changes improve the maintainability and security of the codebase, prepare it for future migration tasks, and ensure that the code is more adaptable and robust. The changes have been manually tested and verified on the staging environment. - Moved
SqlBackendimplementation todatabricks-labs-lsqldependency (#1042). In this change, theSqlBackendimplementation, including classes such asStatementExecutionBackendandRuntimeBackend, has been moved to a separate library,databricks-labs-lsql, which is managed at https://github.com/databrickslabs/lsql. This refactoring simplifies the current repository, promotes code reuse, and improves modularity by leveraging an external dependency. The modification includes adding a new line in the .gitignore file to exclude*.outfiles from version control. - Prepare for a PyPI release (#1038). In preparation for a PyPI release, this change introduces a new GitHub Actions workflow that automates the package release process and ensures the integrity of the released packages by signing them with Sigstore. When a new git tag starting with
vis pushed, this workflow is triggered, building wheels using hatch, drafting a new GitHub release, publishing the package distributions to PyPI, and signing the artifacts with Sigstore. Thepyproject.tomlfile is now used for metadata, replacingsetup.cfgandsetup.py, and is cached to improve build performance. In addition, thepyproject.tomlfile has been updated with recent metadata in preparation for the release, including updates to the package's authors, development status, classifiers, and dependencies. - Prevent fragile
mock.patch('databricks...')in the test code (#1037). This change introduces a custompylintchecker to improve code flexibility and maintainability by preventing fragilemock.patchdesigns in test code. The new checker discourages the use ofMagicMockand encourages the use ofcreate_autospecto ensure that mocks have the same attributes and methods as the original class. This change has been implemented in multiple test files, includingtest_cli.py,test_locations.py,test_mapping.py,test_table_migrate.py,test_table_move.py,test_workspace_access.py,test_redash.py,test_scim.py, andtest_verification.py, to improve the robustness and maintainability of the test code. Additionally, the commit removes theverification.pyfile, which contained aVerificationManagerclass for verifying applied permissions, scope ACLs, roles, and entitlements for various objects in a Databricks workspace. - Removed
mocker.patch("databricks...)fromtest_cli(#1047). In this release, we have made significant updates to the library's handling of Azure and AWS workspaces. We have added new parametersazure_resource_permissionsandaws_permissionsto the_execute_for_cloudfunction incli.py, which are passed to thefunc_azureandfunc_awsfunctions respectively. Thecreate_uber_principalandprincipal_prefix_accesscommands have also been updated to include these new parameters. Additionally, the_azure_setup_uber_principaland_aws_setup_uber_principalfunctions have been updated to accept the newazure_resource_permissionsandaws_resource_permissionsparameters. The_azure_principal_prefix_accessand_aws_principal_prefix_accessfunctions have also been updated similarly. We have also introduced a newaws_resourcesparameter in themigrate_credentialscommand, which is used to migrate Azure Service Principals in ADLS Gen2 locations to UC storage credentials. In terms of testing, we have replaced themocker.patchcalls with the creation ofAzureResourcePermissionsandAWSResourcePermissionsobjects, improving the code's readability and maintainability. Overall, these changes significantly enhance the library's functionality and maintainability in handling Azure and AWS workspaces. - Require Hatch v1.9.4 on build machines (#1049). In this release, we have updated the Hatch package version to 1.9.4 on build machines, addressing issue #1049. The changes include updating the toolchain dependencies and setup in the
.codegen.jsonfile, which simplifies the setup process and now relies on a pre-existing Hatch environment and Python 3. The acceptance workflow has also been updated to use the latest version of Hatch and thedatabrickslabs/sandbox/acceptanceGitHub action versionv0.1.4. Hatch is a Python package manager that simplifies package development and management, and this update provides new features and bug fixes that can help improve the reliability and performance of the acceptance workflow. This change requires version 1.9.4 of the Hatch package on build machines, and it will affect the build process for the project but will not have any impact on the functionality of the project itself. As a software engineer adopting this project, it's important to note this change to ensure that the build process runs smoothly and takes advantage of any new features or improvements in Hatch 1.9.4. - Set acceptance tests to timeout after 45 minutes (#1036). As part of issue #1036, the acceptance tests in this open-source library now have a 45-minute timeout configured, improving the reliability and stability of the testing environment. This change has been implemented in the
.github/workflows/acceptance.ymlfile by adding thetimeoutparameter to the step where thedatabrickslabs/sandbox/acceptanceaction is called. This ensures that the acceptance tests will not run indefinitely and prevents any potential issues caused by long-running tests. By adopting this project, software engineers can now benefit from a more stable and reliable testing environment, with acceptance tests that are guaranteed to complete within a maximum of 45 minutes. - Updated databricks-labs-blueprint requirement from ~0.4.1 to ~0.4.3 (#1058). In this release, the version requirement for the
databricks-labs-blueprintlibrary has been updated from ~0.4.1 to ~0.4.3 in the pyproject.toml file. This change is necessary to support issues #1056 and #1057. The code has been manually tested and is ready for further testing to ensure the compatibility and smooth functioning of the software. It is essential to thoroughly test the latest version of thedatabricks-labs-blueprintlibrary with the existing codebase before deploying it to production. This includes running a comprehensive suite of tests such as unit tests, integration tests, and verification on the staging environment. This modification allows the software to use the latest version of the library, improving its functionality and overall performance. - Use
MockPrompts.extend()functionality in test_install to supply multiple prompts (#1057). This diff introduces theMockPrompts.extend()functionality in thetest_installmodule to enable the supplying of multiple prompts for testing purposes. A newbase_promptsdictionary with default prompts has been added and is extended with additional prompts for specific test cases. This allows for the testing of various scenarios, such as when UCX is already installed on the workspace and the user is prompted to choose between global or user installation. Additionally, newforce_user_environandforce_global_envdictionaries have been added to simulate different installation environments. The functionality of theWorkspaceInstallerclass and mocking ofwebbrowser.openare also utilized in the test cases. These changes aim to ensure the proper functioning of the configuration process for different installation scenarios.
Contributors: @nkvuong, @nfx, @pritishpai, @FastLee, @aminmovahed-db, @mwojtyczka, @qziyuan, @prajin-29, @william-conti