v0.15.0
- Added AWS S3 support for
migrate-locationscommand (#1009). In this release, the open-source library has been enhanced with AWS S3 support for themigrate-locationscommand, enabling efficient and secure management of S3 data. The new functionality includes the identification of missing S3 prefixes and the creation of corresponding roles and policies through the addition of methods_identify_missing_paths,_get_existing_credentials_dict, andcreate_external_locations. The library now also includes new classesAwsIamRole,ExternalLocationInfo, andStorageCredentialInfofor better handling of AWS-related functionality. Additionally, two new tests,test_create_external_locationsandtest_create_external_locations_skip_existing, have been added to ensure the correct behavior of the new AWS-related functionality. The new test functiontest_migrate_locations_awschecks the AWS-specific implementation of themigrate-locationscommand, whiletest_missing_aws_cliverifies the correct error message is displayed when the AWS CLI is not found in the system path. These changes enhance the library's capabilities, improving data security, privacy, and overall performance for users working with AWS S3. - Added
databricks labs ucx create-uber-principalcommand to create Azure Service Principal for migration (#976). The new CLI command,databricks labs ucx create-uber-principal, has been introduced to create an Azure Service Principal (SPN) and grant it STORAGE BLOB READER access on all the storage accounts used by the tables in the workspace. The SPN information is then stored in the UCX cluster policy. A new class, AzureApiClient, has been added to isolate Azure API calls, and unit and integration tests have been included to verify the functionality. This development enhances migration capabilities for Azure workspaces, providing a more streamlined and automated way to create and manage Service Principals, and improves the functionality and usability of the UCX tool. The changes are well-documented and follow the project's coding standards. - Added
migrate-locationscommand (#1016). In this release, we've added a new CLI command,migrate_locations, to create Unity Catalog (UC) external locations. This command extracts candidates for location creation from theguess_external_locationsassessment task and checks if corresponding UC Storage Credentials exist before creating the locations. Currently, the command only supports Azure, with plans to add support for AWS and GCP in the future. Themigrate_locationsfunction is marked with theucx.commanddecorator and is available as a command-line interface (CLI) command. The pull request also includes unit tests for this new command, which check the environment (Azure, AWS, or GCP) before executing the migration and log a message if the environment is AWS or GCP, indicating that the migration is not yet supported on those platforms. No changes have been made to existing workflows, commands, or tables. - Added handling for widget delete on upgrade platform bug (#1011). In this release, the
_install_dashboardmethod indashboards.pyhas been updated to handle a platform bug that occurred during the deletion of dashboard widgets during an upgrade process (issue #1011). Previously, the method attempted to delete each widget using theself._ws.dashboard_widgets.delete(widget.id)command, which resulted in aTypeErrorwhen attempting to delete a widget. The updated method now includes a try/except block that catches thisTypeErrorand logs a warning message, while also tracking the issue under bug ES-1061370. The rest of the method remains unchanged, creating a dashboard with the given name, role, and parent folder ID if no widgets are present. This enhancement improves the robustness of the_install_dashboardmethod by adding error handling for the SDK API response when deleting dashboard widgets, ensuring a smoother upgrade process. - Create UC external locations in Azure based on migrated storage credentials (#992). The
locations.pyfile in thedatabricks.labs.ucx.azurepackage has been updated to include a new classExternalLocationsMigration, which creates UC external locations in Azure based on migrated storage credentials. This class takes various arguments, includingWorkspaceClient,HiveMetastoreLocations,AzureResourcePermissions, andAzureResources. It has arun()method that lists any missing external locations in UC, extracts their location URLs, and attempts to create a UC external location with a mapped storage credential name if the missing external location is in the mapping. The class also includes helper methods for generating credential name mappings. Additionally, theresources.pyfile in the same package has been modified to include a new methodmanaged_identity_client_id, which retrieves the client ID of a managed identity associated with a given access connector. Test functions for theExternalLocationsMigrationclass and Azure external locations functionality have been added in the new filetest_locations.py. Thetest_resources.pyfile has been updated to include tests for themanaged_identity_client_idmethod. A newmappings.jsonfile has also been added for tests related to Azure external location mappings based on migrated storage credentials. - Deprecate legacy installer (#1014). In this release, we have deprecated the legacy installer for the UCX project, which was previously implemented as a bash script. A warning message has been added to inform users about the deprecation and direct them to the UCX installation instructions. The functionality of the script remains unchanged, and it still performs tasks such as installing Python dependencies and building Python bindings. The script will eventually be replaced with the
databricks labs install ucxcommand. This change is part of issue #1014 and is intended to streamline the installation process and improve the overall user experience. We recommend that users update their installation process to the new recommended method as soon as possible to avoid any issues with the legacy installer in the future. - Prompt user if Terraform utilised for deploying infrastructure (#1004). In this update, the
config.pyfile has been modified to include a new attribute,is_terraform_used, in theWorkspaceConfigclass. This boolean flag indicates whether Terraform has been used for deploying certain entities in the workspace. Issue #393 has been addressed with this change. TheWorkspaceInstallerconfiguration has also been updated to take advantage of this new attribute, allowing developers to determine if Terraform was used for infrastructure deployment, thereby increasing visibility into the deployment process. Additionally, a new prompt has been added to thewarehouse_typefunction to ascertain if Terraform is being utilized for infrastructure deployment, setting theis_terraform_usedvariable to True if it is. This improvement is intended for software engineers adopting this open-source library. - Updated CONTRIBUTING.md (#1005). In this contribution to the open-source library, the CONTRIBUTING.md file has been significantly updated with clearer instructions on how to effectively contibute to the project. The previous command to print the Python path has been removed, as the IDE is now advised to be configured to use the Python interpreter from the virtual environment. A new step has been added, recommending the use of a consistent styleguide and formatting of the code before every commit. Moreover, it is now encouraged to run tests before committing to minimize potential issues during the review process. The steps on how to make a Fork from the ucx repo and create a PR have been updated with links to official documentation. Lastly, the commit now includes information on handling dependency errors that may occur after
git pull. - Updated databricks-labs-blueprint requirement from ~=0.2.4 to ~=0.3.0 (#1001). In this pull request update, the requirements file, pyproject.toml, has been modified to upgrade the databricks-labs-blueprint package from version ~0.2.4 to ~0.3.0. This update integrates the latest features and bug fixes of the package, including an automated upgrade framework, a brute-forcing approach for handling SerdeError, and enhancements for running nightly integration tests with service principals. These improvements increase the testability and functionality of the software, ensuring its stable operation with service principals during nightly integration tests. Furthermore, the reliability of the test for detecting existing installations has been reinforced by adding a new test function that checks for the correct detection of existing installations and retries the test for up to 15 seconds if they are not.
Dependency updates:
- Updated databricks-labs-blueprint requirement from ~=0.2.4 to ~=0.3.0 (#1001).
Contributors: @nfx, @qziyuan, @pritishpai, @FastLee, @dependabot[bot], @william-conti, @prajin-29, @HariGS-DB