Releases: databrickslabs/ucx
Releases · databrickslabs/ucx
v0.18.0
- Added Legacy Table ACL grants migration (#1054). This commit introduces a legacy table ACL grants migration to the
migrate-tablesworkflow, resolving issue #340 and paving the way for follow-up PRs #887 and #907. A newGrantsCrawlerclass is added for crawling grants, along with aGroupManagerclass to manage groups during migration. TheTablesMigrateclass is updated to accept an instance ofGrantsCrawlerandGroupManagerin its constructor. The migration process has been thoroughly tested with unit tests, integration tests, and manual testing on a staging environment. The changes include the addition of a new Enum classAclMigrationWhatand updates to theTabledataclass, and affect the way tables are selected for migration based on rules. The logging and error handling have been improved in theskip_schemafunction. - Added
databricks labs ucx cluster-remapcommand to remap legacy cluster configurations to UC-compatible (#994). In this open-source library update, we have developed and added thedatabricks labs ucx cluster-remapcommand, which facilitates the remapping of legacy cluster configurations to UC-compatible ones. This new CLI command comes with user documentation to guide the cluster remapping process. Additionally, we have expanded the functionality of creating and managing UC external catalogs and schemas with the inclusion ofcreate-catalogs-schemasandrevert-cluster-remapcommands. This change does not modify existing commands or workflows and does not introduce new tables. Thedatabricks labs ucx cluster-remapcommand allows users to re-map and revert the re-mapping of clusters from Unity Catalog (UC) using the CLI, ensuring compatibility and streamlining the migration process. The new command and associated functions have been manually tested for functionality. - Added
migrate-tablesworkflow (#1051). Themigrate-tablesworkflow has been added, which allows for more fine-grained control over the resources allocated to the workspace. This workflow includes two new instance variablesmin_workersandmax_workersin theWorkspaceConfigclass, with default values of 1 and 10 respectively. A newtriggerfunction has also been introduced, which initializes a configuration, SQL backend, and WorkspaceClient based on the provided configuration file. Therun_taskfunction has been added, which looks up the specified task, logs relevant information, and runs the task's function with the provided arguments. TheTaskclass'sfnattribute now includes anInstallationobject as a parameter. Additionally, a newmigrate-tablesworkflow has been added for migrating tables from the Hive Metastore to the Unity Catalog, along with new classes and methods for table mapping, migration status refreshing, and migrating tables. Themigrate_dbfs_root_delta_tablesandmigrate_external_tables_syncmethods perform migrations for Delta tables located in the DBFS root and synchronize external tables, respectively. These functions use the workspace client to access the catalogs and ensure proper migration. Integration tests have also been added for these new methods to ensure their correct operation. - Added handling for
SYNCcommand failures (#1073). This pull request introduces changes to improve handling ofSYNCcommand failures during external table migrations in the Hive metastore. Previously, theSYNCcommand's result was not checked, and failures were not logged. Now, the_migrate_external_tablemethod intable_migrate.pyfetches the result of theSYNCcommand execution, logs a warning message for failures, and returnsFalseif the command fails. A new integration test has been added to simulate a failedSYNCcommand due to a non-existent catalog and schema, ensuring the migration tool handles such failures. A new test case has also been added to verify the handling ofSYNCcommand failures during external table migrations, using a mock backend to simulate failures and checking for appropriate log messages. These changes enhance the reliability and robustness of the migration process, providing clearer error diagnosis and handling for potentialSYNCcommand failures. - Added initial version of
databricks labs ucx migrate-local-codecommand (#1067). A newdatabricks labs ucx migrate-local-codecommand has been added to facilitate migration of local code to a Databricks environment, specifically targeting Python and SQL files. This initial version is experimental and aims to help users and administrators manage code migration, maintain consistency across workspaces, and enhance compatibility with the Unity Catalog, a component of Databricks' data and AI offerings. The command introduces a newFilesclass for applying migrations to code files, considering their language. It also updates the.gitignorefile and the pyproject.toml file to ensure appropriate version control management. Additionally, new classes and methods have been added to support code analysis, transformation, and linting for various programming languages. These improvements will aid in streamlining the migration process and ensuring compatibility with Databricks' environment. - Added instance pool to cluster policy (#1078). A new field,
instance_pool_id, has been added to the cluster policy configuration inpolicy.py, allowing users to specify the ID of an instance pool to be applied to all workflow clusters in the policy. This ID can be manually set or automatically retrieved by the system. A new private method,_get_instance_pool_id(), has been added to handle the retrieval of the instance pool ID. Additionally, a new test for table migration jobs has been added totest_installation.pyto ensure the migration job is correctly configured with the specified parallelism, minimum and maximum number of workers, and instance pool ID. A new test case for creating a cluster policy with an instance pool has also been added totests/unit/installer/test_policy.pyto ensure the instance pool is added to the cluster policy during creation. These changes provide users with more control over instance pools and cluster policies, and improve the overall functionality of the library. - Fixed
ucx movelogic forMANAGED&EXTERNALtables (#1062). Theucx movecommand has been updated to allow for the movement of UC tables/views after the table upgrade process, providing flexibility in managing catalog structure. The command now supports moving multiple tables simultaneously, dropping managed tables/views upon confirmation, and deep-cloning managed tables while dropping and recreating external tables. A refactoring of theTableMoveclass has improved code organization and readability, and the associated unit tests have been updated to reflect these changes. This feature is targeted towards developers and administrators seeking to adjust their catalog structure after table upgrades, with the added ability to manage exceptional conditions gracefully. - Fixed integration testing with random product names (#1074). In the recent update, the
triggerfunction in thetasks.pymodule of theucxframework has undergone modification to incorporate a new argument,install_folder, within theInstallationobject. This object is now generated locally within thetriggerfunction and subsequently passed to therun_taskfunction. Theinstall_folderis determined by obtaining the parent directory of theconfig_pathvariable, transforming it into a POSIX-style path, and eliminating the leading "/Workspace" prefix. This enhancement guarantees that therun_taskfunction acquires the correct installation folder for theucxframework, thereby improving the overall functionality and precision of the framework. Furthermore, theInstallation.currentmethod has been supplanted with the newly formedInstallationobject, which now encompasses theinstall_folderargument. - Refactor installer to separate workflows methods from the installer class (#1055). In this release, the installer in the
cli.pyfile has been refactored to improve modularity and maintainability. The installation and workflow functionalities have been separated by importing a new class calledWorkflowsInstallationfromdatabricks.labs.ucx.installer.workflows. TheWorkspaceInstallationclass is no longer used in various functions, and the newWorkflowsInstallationclass is used instead. Additionally, a new mixin class calledInstallationMixinhas been introduced, which includes methods for uninstalling UCX, removing jobs, and validating installation steps. TheWorkflowsInstallationclass now inherits from this mixin class. A new file,workflows.py, has been added to thedatabricks/labs/ucx/installerdirectory, which contains methods for managing Databricks jobs. The newWorkflowsInstallationclass is responsible for deploying workflows, uploading wheels to DBFS or WSFS, and creating debug notebooks. The refactoring also includes the addition of new methods for handling specific workflows, such asrun_workflow,validate_step, andrepair_run, which are now contained in theWorkflowsInstallationclass. Thetest_install.pyfile in thetests/unitdirectory has also been updated to include new imports and test functions to accommodate these changes. - S...
v0.17.0
- Added AWS IAM role support to
databricks labs ucx create-uber-principalcommand (#993). Thedatabricks labs ucx create-uber-principalcommand now supports AWS Identity and Access Management (IAM) roles for external table migration. This new feature introduces a CLI command to create anuber-IAMprofile, which checks for the UCX migration cluster policy and updates or adds the migration policy to provide access to the relevant table locations. If no IAM instance profile or role is specified in the cluster policy, a new one is created and the new migration policy is added. This change includes new methods and functions to handle AWS IAM roles, instance profiles, and related trust policies. Additionally, new unit and integration tests have been added and verified on the staging environment. The implementation also identifies all S3 buckets used by the Instance Profiles configured in the workspace. - Added Dashboard widget to show the list of cluster policies along with DBR version (#1013). In this code revision, the
assessmentmodule of the 'databricks/labs/ucx' package has been updated to include a newPoliciesCrawlerclass, which fetches, assesses, and snapshots cluster policies. This class extendsCrawlerBaseandCheckClusterMixinand introduces the '_crawl', '_assess_policies', '_try_fetch', andsnapshotmethods. ThePolicyInfodataclass has been added to hold policy information, with a structure similar to theClusterInfodataclass. TheClusterInfodataclass has been updated to includespark_versionandpolicy_idattributes. A new table for policies has been added, and cluster policies along with the DBR version are loaded into this table. Relevant user documentation, tests, and a Dashboard widget have been added to support this feature. Thecreatefunction in 'fixtures.py' has been updated to enable a Delta preview feature in Spark configurations, and a new SQL file has been included for querying cluster policies. Additionally, a newcrawl_cluster_policiesmethod has been added to scan and store cluster policies with matching configurations. - Added
migration_statustable to capture a snapshot of migrated tables (#1041). Amigration_statustable has been added to track the status of migrated tables in the database, enabling improved management and tracking of migrations. The newMigrationStatusclass, which is a dataclass that holds the source and destination schema, table, and updated timestamp, is added. TheTablesMigrateclass now has a new_migration_status_refresherattribute that is an instance of the newMigrationStatusRefresherclass. This class crawls themigration_statustable and returns a snapshot of the migration status, which is used to refresh the migration status and check if the table is upgraded. Additionally, the_init_seen_tablesmethod is updated to get the seen tables from the_migration_status_refresherinstead of fetching from the table properties. TheMigrationStatusRefresherclass fetches the migration status table and returns a snapshot of the migration status. This change also adds new test functions in the test file for the Hive metastore, which covers various scenarios such as migrating managed tables with and without caching, migrating external tables, and reverting migrated tables. - Added a check for existing inventory database to avoid losing existing, inject installation objects in tests and try fetching existing installation before setting global as default (#1043). In this release, we have added a new method,
_check_inventory_database_exists, to theWorkspaceInstallationclass, which checks if an inventory database with a given name already exists in the Workspace. This prevents accidental overwriting of existing data and improves the robustness of handling inventory databases. Thevalidate_and_runmethod has been updated to callapp.current_installation(workspace_client), allowing for a more flexible handling of installations. TheInstallationclass import has been updated to includeSerdeError, and the test suite has been updated to inject installation objects and check for existing installations before setting the global installation as default. A new argumentinventory_schema_suffixhas been added to thefactorymethod for customization of the inventory schema name. We have also added a new methodcheck_inventory_database_existsto theWorkspaceInstallerclass, which checks if an inventory database already exists for a given installation type and raises anAlreadyExistserror if it does. The behavior of thedownloadmethod in theWorkspaceClientclass has been mocked, and theget_statusmethod has been updated to returnNotFoundin certain tests. These changes aim to improve the robustness, flexibility, and safety of the installation process in the Workspace. - Added a check for external metastore in SQL warehouse configuration (#1046). In this release, we have added new functionality to the Unity Catalog (UCX) installation process to enable checking for and connecting to an external Hive metastore configuration. A new method,
_get_warehouse_config_with_external_hive_metastore, has been introduced to retrieve the workspace warehouse config and identify if it is set up for an external Hive metastore. If so, and the user confirms the prompt, UCX will be configured to connect to the external metastore. Additionally, new methods_extract_external_hive_metastore_sql_confandtest_cluster_policy_definition_<cloud_provider>_hms_warehouse()have been added to handle the external metastore configuration for Azure, AWS, and GCP, and to handle the case when the data_access_config is empty. These changes provide more flexibility and ease of use when installing UCX with external Hive metastore configurations. The new importsEndpointConfPair,GetWorkspaceWarehouseConfigResponsefrom thedatabricks.sdk.service.sqlpackage are used to handle the endpoint configuration of the SQL warehouse. - Added integration tests for AWS - create locations (#1026). In this release, we have added comprehensive integration tests for AWS resources and their management in the
tests/unit/assessment/test_aws.pyfile. TheAWSResourcesclass has been updated with new methods (AwsIamRole, add_uc_role, add_uc_role_policy, and validate_connection) and the regular expression for matching S3 resource ARN has been modified. Thecreate_external_locationsmethod now allows for creating external locations without validating them, and the_identify_missing_external_locationsfunction has been enhanced to match roles with a wildcard pattern. The new tests include validating the integration of AWS services with the system, testing the CLI's behavior when it is missing, and introducing new configuration scenarios with the addition of a Key Management Service (KMS) key during the creation of IAM roles and policies. These changes improve the robustness and reliability of AWS resource integration and handling in our system. - Bump Databricks SDK to v0.22.0 (#1059). In this release, we are bumping the Databricks SDK version to 0.22.0 and upgrading the
databricks-labs-lsqlpackage to ~0.2.2. The new dependencies for this release includedatabricks-sdk==0.22.0,databricks-labs-lsql~=0.2.2,databricks-labs-blueprint~=0.4.3, andPyYAML>=6.0.0,<7.0.0. In thefixtures.pyfile, we have addedPermissionLevel.CAN_QUERYto theCAN_VIEWandCAN_MANAGEpermissions in the_pathfunction, allowing users to query the endpoint. Additionally, we have updated thetest_endpointsfunction in thetest_generic.pyfile as part of the integration tests for workspace access. This change updates the permission level for creating a serving endpoint fromCAN_MANAGEtoCAN_QUERY, meaning that the assigned group can now only query the endpoint. We have also included thetest_feature_tablesfunction in the commit, which tests the behavior of feature tables in the Databricks workspace. This change only affects thetest_endpointsfunction and its assert statements, and does not impact the functionality of thetest_feature_tablesfunction. - Changed default UCX installation folder to
/Applications/ucxfrom/Users/<me>/.ucxto allow multiple users users utilising the same installation (#854). In this release, we've added a new advanced feature that allows users to force the installation of UCX over an existing installation using theUCX_FORCE_INSTALLenvironment variable. This variable can take two valuesglobaland 'user', providing more control and flexibility in installing UCX. The default UCX installation folder has been changed to /Applications/ucx from /Users//.ucx to enable multiple users to utilize the same installation. A table detailing the expected install location,install_folder, and mode for each combination of global and user values has been added to the README file. We've also added user prompts to confirm the installation if UCX is already installed and theUCX_FORCE_INSTALLvariable is set to 'user'. This feature is useful when users want to install UCX in a specific location or force the installation over an existing installation. However, it is recommended to use this feature with caution, as it can potentially break existing installations if not used correctly. Additionally, several changes to the implementation of the UCX installation process have been made, as well as new tests to ensure that the installation process works correctly in various scenarios. - Fix: Rec...
v0.16.0
- Added AWS IAM roles support to
databricks labs ucx migrate-credentialscommand (#973). This commit adds AWS Identity and Access Management (IAM) roles support to thedatabricks labs ucx migrate-credentialscommand, resolving issue #862 and being related to pull request #874. It includes the addition of aloadfunction toAWSResourcePermissionsto return identified instance profiles and the creation of anIamRoleMigrationclass underaws/credentials.pyto migrate identified AWS instance profiles. Additionally, user documentation and a new CLI commanddatabricks labs ucx migrate-credentialshave been added, and the changes have been thoroughly tested with manual, unit, and integration tests. The functionality additions include new methods such asadd_uc_role_policyandupdate_uc_trust_role, among others, designed to facilitate the migration process for AWS IAM roles. - Added
create-catalogs-schemascommand to prepare destination catalogs and schemas before table migration (#1028). The Databricks Labs Unity Catalog (UCX) tool has been updated with a newcreate-catalogs-schemascommand to facilitate the creation of destination catalogs and schemas prior to table migration. This command should be executed after thecreate-table-mappingcommand and is designed to prepare the workspace for migrating tables to UC. Additionally, a newCatalogSchemaclass has been added to thehive_metastorepackage to manage the creation of catalogs and schemas in the Hive metastore. This new functionality simplifies the process of preparing the destination Hive metastore for table migration, reducing the likelihood of user errors and ensuring that the metastore is properly configured. Unit tests have been added to thetests/unit/hive_metastoredirectory to verify the behavior of theCatalogSchemaclass and the newcreate-catalogs-schemascommand. This command is intended for use in contexts where GCP is not supported. - Added automated upgrade option to set up cluster policy (#1024). This commit introduces an automated upgrade option for setting up a cluster policy for older versions of UCX, separating the cluster creation policy from install.py to installer.policy.py and adding an upgrade script for older UCX versions. A new class,
ClusterPolicyInstaller, is added to thepolicy.pyfile in theinstallerpackage to manage the creation and update of a Databricks cluster policy for Unity Catalog Migration. This class handles creating a new cluster policy with specific configurations, extracting external Hive Metastore configurations, and updating job policies. Additionally, the commit includes refactoring, removal of library references, and a new script, v0.15.0_added_cluster_policy.py, which contains the upgrade function. The changes are tested through manual and automated testing with unit tests and integration tests. This feature is intended for software engineers working with the project. - Added crawling for init scripts on local files to assessment workflow (#960). This commit introduces the ability to crawl init scripts stored on local files and S3 as part of the assessment workflow, resolving issue #9
- Added database filter for the
assessmentworkflow (#989). In this release, we have added a new configuration option,include_databases, to the assessment workflow which allows users to specify a list of databases to include for migration, rather than crawling all the databases in the Hive Metastore. This feature is implemented in theTablesCrawler,UdfsCrawler,GrantsCrawlerclasses and the associated functions such as_all_databases,getIncludeDatabases,_select_databases. These changes aim to improve efficiency and reduce unnecessary crawling, and are accompanied by modifications to existing functionality, as well as the addition of unit and integration tests. The changes have been manually tested and verified on a staging environment. - Estimate migration effort based on assessment database (#1008). In this release, a new functionality has been added to estimate the migration effort for each asset in the assessment database. The estimation is presented in days and is displayed on a new estimates dashboard with a summary widget for a global estimate per object type, along with assumptions and scope for each object type. A new
queryparameter has been added to theSimpleQueryclass to support this feature. Additional changes include the update of the_install_vizand_install_querymethods, the inclusion of thedata_source_idin the query metadata, and the addition of tests to ensure the proper functioning of the new feature. A new fixture,mock_installation_with_jobs, has been added to support testing of the assessment estimates dashboard. - Explicitly write to
hive_metastorefromcrawl_tablestask (#1021). In this release, we have improved the clarity and specificity of our handling of thehive_metastorein thecrawl_tablestask. Previously, thedf.write.saveAsTablemethod was used without explicitly specifying thehive_metastoredatabase, which could result in ambiguity. To address this issue, we have updated thesaveAsTablemethod to include thehive_metastoredatabase, ensuring that tables are written to the correct location in the Hive metastore. These changes are confined to thesrc/databricks/labs/ucx/hive_metastore/tables.scalafile and affect thecrawl_tablestask. While no new methods have been added, the existingsaveAsTablemethod has been modified to enhance the accuracy and predictability of our interaction with the Hive metastore. - Improved documentation for
databricks labs ucx movecommand (#1025). Thedatabricks labs ucx movecommand has been updated with new improvements to its documentation, providing enhanced clarity and ease of use for developers and administrators. This command facilitates the movement of UC tables/table(s) from one schema to another, either in the same or different catalog, during the table upgrade process. A significant enhancement is the preservation of the source table's permissions when moving to a new schema or catalog, maintaining the original table's access controls, simplifying the management of table permissions, and streamlining the migration process. These improvements aim to facilitate a more efficient table migration experience, ensuring that developers and administrators can effectively manage their UC tables while maintaining the desired level of access control and security. - Updated databricks-sdk requirement from ~=0.20.0 to ~=0.21.0 (#1030). In this update, the
databricks-sdkpackage requirement has been updated to version~=0.21.0from~=0.20.0. This new version addresses several bugs and provides enhancements, including the fix for theget_workspace_clientmethod in GCP, the use of theall-apisscope with the external browser, and an attempt to initialize all Databricks globals. Moreover, the API's settings nesting approach has changed, which may cause compatibility issues with previous versions. Several new services and dataclasses have been added to the API, and documentation and examples have been updated accordingly. There are no updates to thedatabricks-labs-blueprintandPyYAMLdependencies in this commit.
Contributors: @nfx, @HariGS-DB, @william-conti, @dependabot[bot], @prajin-29, @FastLee, @qziyuan, @nkvuong, @mohanbaabu1996
v0.15.0
- Added AWS S3 support for
migrate-locationscommand (#1009). In this release, the open-source library has been enhanced with AWS S3 support for themigrate-locationscommand, enabling efficient and secure management of S3 data. The new functionality includes the identification of missing S3 prefixes and the creation of corresponding roles and policies through the addition of methods_identify_missing_paths,_get_existing_credentials_dict, andcreate_external_locations. The library now also includes new classesAwsIamRole,ExternalLocationInfo, andStorageCredentialInfofor better handling of AWS-related functionality. Additionally, two new tests,test_create_external_locationsandtest_create_external_locations_skip_existing, have been added to ensure the correct behavior of the new AWS-related functionality. The new test functiontest_migrate_locations_awschecks the AWS-specific implementation of themigrate-locationscommand, whiletest_missing_aws_cliverifies the correct error message is displayed when the AWS CLI is not found in the system path. These changes enhance the library's capabilities, improving data security, privacy, and overall performance for users working with AWS S3. - Added
databricks labs ucx create-uber-principalcommand to create Azure Service Principal for migration (#976). The new CLI command,databricks labs ucx create-uber-principal, has been introduced to create an Azure Service Principal (SPN) and grant it STORAGE BLOB READER access on all the storage accounts used by the tables in the workspace. The SPN information is then stored in the UCX cluster policy. A new class, AzureApiClient, has been added to isolate Azure API calls, and unit and integration tests have been included to verify the functionality. This development enhances migration capabilities for Azure workspaces, providing a more streamlined and automated way to create and manage Service Principals, and improves the functionality and usability of the UCX tool. The changes are well-documented and follow the project's coding standards. - Added
migrate-locationscommand (#1016). In this release, we've added a new CLI command,migrate_locations, to create Unity Catalog (UC) external locations. This command extracts candidates for location creation from theguess_external_locationsassessment task and checks if corresponding UC Storage Credentials exist before creating the locations. Currently, the command only supports Azure, with plans to add support for AWS and GCP in the future. Themigrate_locationsfunction is marked with theucx.commanddecorator and is available as a command-line interface (CLI) command. The pull request also includes unit tests for this new command, which check the environment (Azure, AWS, or GCP) before executing the migration and log a message if the environment is AWS or GCP, indicating that the migration is not yet supported on those platforms. No changes have been made to existing workflows, commands, or tables. - Added handling for widget delete on upgrade platform bug (#1011). In this release, the
_install_dashboardmethod indashboards.pyhas been updated to handle a platform bug that occurred during the deletion of dashboard widgets during an upgrade process (issue #1011). Previously, the method attempted to delete each widget using theself._ws.dashboard_widgets.delete(widget.id)command, which resulted in aTypeErrorwhen attempting to delete a widget. The updated method now includes a try/except block that catches thisTypeErrorand logs a warning message, while also tracking the issue under bug ES-1061370. The rest of the method remains unchanged, creating a dashboard with the given name, role, and parent folder ID if no widgets are present. This enhancement improves the robustness of the_install_dashboardmethod by adding error handling for the SDK API response when deleting dashboard widgets, ensuring a smoother upgrade process. - Create UC external locations in Azure based on migrated storage credentials (#992). The
locations.pyfile in thedatabricks.labs.ucx.azurepackage has been updated to include a new classExternalLocationsMigration, which creates UC external locations in Azure based on migrated storage credentials. This class takes various arguments, includingWorkspaceClient,HiveMetastoreLocations,AzureResourcePermissions, andAzureResources. It has arun()method that lists any missing external locations in UC, extracts their location URLs, and attempts to create a UC external location with a mapped storage credential name if the missing external location is in the mapping. The class also includes helper methods for generating credential name mappings. Additionally, theresources.pyfile in the same package has been modified to include a new methodmanaged_identity_client_id, which retrieves the client ID of a managed identity associated with a given access connector. Test functions for theExternalLocationsMigrationclass and Azure external locations functionality have been added in the new filetest_locations.py. Thetest_resources.pyfile has been updated to include tests for themanaged_identity_client_idmethod. A newmappings.jsonfile has also been added for tests related to Azure external location mappings based on migrated storage credentials. - Deprecate legacy installer (#1014). In this release, we have deprecated the legacy installer for the UCX project, which was previously implemented as a bash script. A warning message has been added to inform users about the deprecation and direct them to the UCX installation instructions. The functionality of the script remains unchanged, and it still performs tasks such as installing Python dependencies and building Python bindings. The script will eventually be replaced with the
databricks labs install ucxcommand. This change is part of issue #1014 and is intended to streamline the installation process and improve the overall user experience. We recommend that users update their installation process to the new recommended method as soon as possible to avoid any issues with the legacy installer in the future. - Prompt user if Terraform utilised for deploying infrastructure (#1004). In this update, the
config.pyfile has been modified to include a new attribute,is_terraform_used, in theWorkspaceConfigclass. This boolean flag indicates whether Terraform has been used for deploying certain entities in the workspace. Issue #393 has been addressed with this change. TheWorkspaceInstallerconfiguration has also been updated to take advantage of this new attribute, allowing developers to determine if Terraform was used for infrastructure deployment, thereby increasing visibility into the deployment process. Additionally, a new prompt has been added to thewarehouse_typefunction to ascertain if Terraform is being utilized for infrastructure deployment, setting theis_terraform_usedvariable to True if it is. This improvement is intended for software engineers adopting this open-source library. - Updated CONTRIBUTING.md (#1005). In this contribution to the open-source library, the CONTRIBUTING.md file has been significantly updated with clearer instructions on how to effectively contibute to the project. The previous command to print the Python path has been removed, as the IDE is now advised to be configured to use the Python interpreter from the virtual environment. A new step has been added, recommending the use of a consistent styleguide and formatting of the code before every commit. Moreover, it is now encouraged to run tests before committing to minimize potential issues during the review process. The steps on how to make a Fork from the ucx repo and create a PR have been updated with links to official documentation. Lastly, the commit now includes information on handling dependency errors that may occur after
git pull. - Updated databricks-labs-blueprint requirement from ~=0.2.4 to ~=0.3.0 (#1001). In this pull request update, the requirements file, pyproject.toml, has been modified to upgrade the databricks-labs-blueprint package from version ~0.2.4 to ~0.3.0. This update integrates the latest features and bug fixes of the package, including an automated upgrade framework, a brute-forcing approach for handling SerdeError, and enhancements for running nightly integration tests with service principals. These improvements increase the testability and functionality of the software, ensuring its stable operation with service principals during nightly integration tests. Furthermore, the reliability of the test for detecting existing installations has been reinforced by adding a new test function that checks for the correct detection of existing installations and retries the test for up to 15 seconds if they are not.
Dependency updates:
- Updated databricks-labs-blueprint requirement from ~=0.2.4 to ~=0.3.0 (#1001).
Contributors: @nfx, @qziyuan, @pritishpai, @FastLee, @dependabot[bot], @william-conti, @prajin-29, @HariGS-DB
v0.14.0
- Added
upgraded_from_workspace_idproperty to migrated tables to indicated the source workspace (#987). In this release, updates have been made to the_migrate_external_table,_migrate_dbfs_root_table, and_migrate_viewmethods in thetable_migrate.pyfile to include a new parameterupgraded_from_wsin the SQL commands used to alter tables, views, or managed tables. This parameter is used to store the source workspace ID in the migrated tables, indicating the migration origin. A new utility methodsql_alter_fromhas been added to theTableclass intables.pyto generate the SQL command with the new parameter. Additionally, a new class-level attributeUPGRADED_FROM_WS_PARAMhas been added to theTableclass intables.pyto indicate the source workspace. A new propertyupgraded_from_workspace_idhas been added to migrated tables to store the source workspace ID. These changes resolve issue #899 and are tested through manual testing, unit tests, and integration tests. No new CLI commands, workflows, or tables have been added or modified, and there are no changes to user documentation. - Added a command to create account level groups if they do not exist (#763). This commit introduces a new feature that enables the creation of account-level groups if they do not already exist in the account. A new command,
create-account-groups, has been added to thedatabricks labs ucxtool, which crawls all workspaces in the account and creates account-level groups if a corresponding workspace-local group is not found. The feature supports various scenarios, including creating account-level groups that exist in some workspaces but not in others, and creating multiple account-level groups with the same name but different members. Several new methods have been added to theaccount.pyfile to support the new feature, and thetest_account.pyfile has been updated with new tests to ensure the correct behavior of thecreate_account_level_groupsmethod. Additionally, thecli.pyfile has been updated to include the newcreate-account-groupscommand. With these changes, users can easily manage account-level groups and ensure that they are consistent across all workspaces in the account, improving the overall user experience. - Added assessment for the incompatible
RunSubmitAPI usages (#849). In this release, the assessment functionality for incompatibleRunSubmitAPI usages has been significantly enhanced through various changes. The 'clusters.py' file has seen improvements in clarity and consistency with the renaming of private methodscheck_spark_confto_check_spark_confandcheck_cluster_failuresto_check_cluster_failures. The_assess_clustersmethod has been updated to call the renamed_check_cluster_failuresmethod for thorough checks of cluster configurations, resulting in better assessment functionality. A newSubmitRunsCrawlerclass has been added to thedatabricks.labs.ucx.assessment.jobsmodule, implementingCrawlerBase,JobsMixin, andCheckClusterMixinclasses. This class crawls and assesses job runs based on their submitted runs, ensuring compatibility and identifying failure issues. Additionally, a new configuration attribute,num_days_submit_runs_history, has been introduced in theWorkspaceConfigclass of theconfig.pymodule, controlling the number of days for which submission history ofRunSubmitAPI calls is retained. Lastly, various new JSON files have been added for unit testing, assessing theRunSubmitAPI usages related to different scenarios like dbt task runs, Git source-based job runs, JAR file runs, and more. These tests will aid in identifying and addressing potential compatibility issues with theRunSubmitAPI. - Added group members difference to the output of
validate-groups-membershipcli command (#995). Thevalidate-groups-membershipcommand has been updated to include a comparison of group memberships at both the account and workspace levels. This enhancement is implemented through thevalidate_group_membershipfunction, which has been updated to calculate the difference in members between the two levels and display it in a newgroup_members_differencecolumn. This allows for a more detailed analysis of group memberships and easily identifies any discrepancies between the account and workspace levels. The corresponding unit test file, "test_groups.py," has been updated to include a new test case that verifies the calculation of thegroup_members_differencevalue. The functionality of the other commands remains unchanged. The newgroup_members_differencevalue is calculated as the difference in the number of members in the workspace group and the account group, with a positive value indicating more members in the workspace group and a negative value indicating more members in the account group. The table template in the labs.yml file has also been updated to include the new column for the group membership difference. - Added handling for empty
directory_idif managed identity encountered during the crawling of StoragePermissionMapping (#986). This PR adds atypefield to theStoragePermissionMappingandPrincipaldataclasses to differentiate between service principals and managed identities, allowingNonefor thedirectory_idfield if the principal is not a service principal. During the migration to UC storage credentials, managed identities are currently ignored. These changes improve handling of managed identities during the crawling ofStoragePermissionMapping, prevent errors when creating storage credentials with managed identities, and address issue #339. The changes are tested through unit tests, manual testing, and integration tests, and only affect theStoragePermissionMappingclass and related methods, without introducing new commands, workflows, or tables. - Added migration for Azure Service Principals with secrets stored in Databricks Secret to UC Storage Credentials (#874). In this release, we have made significant updates to migrate Azure Service Principals with their secrets stored in Databricks Secret to UC Storage Credentials, enhancing security and management of storage access. The changes include: Addition of a new
migrate_credentialscommand in thelabs.ymlfile to migrate credentials for storage access to UC storage credential. Modification ofsecrets.pyto handle the case where a secret has been removed from the backend and to log warning messages for secrets with invalid Base64 bytes. Introduction of theStorageCredentialManagerandServicePrincipalMigrationclasses incredentials.pyto manage Azure Service Principals and their associated client secrets, and to migrate them to UC Storage Credentials. Addition of a newdirectory_idattribute in thePrincipalclass and its associated dataclass inresources.pyto store the directory ID for creating UC storage credentials using a service principal. Creation of a new pytest fixture,make_storage_credential_spn, infixtures.pyto simplify writing tests requiring Databricks Storage Credentials with Azure Service Principal auth. Addition of a new test file for the Azure integration of the project, including new classes, methods, and test cases for testing the migration of Azure Service Principals to UC Storage Credentials. These improvements will ensure better security and management of storage access using Azure Service Principals, while providing more efficient and robust testing capabilities. - Added permission migration support for feature tables and the root permissions for models and feature tables (#997). This commit introduces support for migration of permissions related to feature tables and sets root permissions for models and feature tables. New functions such as
feature_store_listing,feature_tables_root_page,models_root_page, andtokens_and_passwordshave been added to facilitate population of a workspace access page with necessary permissions information. Thefactoryfunction inmanager.pyhas been updated to include new listings for models' root page, feature tables' root page, and the feature store for enhanced management and access control of models and feature tables. New classes and methods have been implemented to handle permissions for these resources, utilizingGenericPermissionsSupport,AccessControlRequest, andMigratedGroupclasses. Additionally, new test methods have been included to verify feature tables listing functionality and root page listing functionality for feature tables and registered models. The test manager method has been updated to includefeature-tablesin the list of items to be checked for permissions, ensuring comprehensive testing of permission functionality related to these new feature tables. - Added support for serving endpoints (#990). In this release, we have made significant enhancements to support serving endpoints in our open-source library. The
fixtures.pyfile in thedatabricks.labs.ucx.mixinsmodule has been updated with new classes and functions to create and manage serving endpoints, accompanied by integration tests to verify their functionality. We have added a new listing for serving endpoints in the assessment's permissions crawling, using thews.serving_endpoints.listfunction and theserving-endpointscategory. A new integration test, "test_endpoints," has been added to verify that assessments now crawl perm...
v0.13.2
- Fixed
AnalysisExceptionincrawl_tablestask by ignoring the database that is not found (#970). - Fixed
Unknown: org.apache.hadoop.hive.ql.metadata.HiveException: NoSuchObjectExceptionincrawl_grantstask by ignoring the database that is not found (#967). - Fixed ruff config for ruff==2.0 (#969).
- Made groups integration tests less flaky (#965).
v0.13.1
- Added secret detection logic to Azure service principal crawler (#950).
- Create storage credentials based on instance profiles and existing roles (#869).
- Enforced
protected-accesspylint rule (#956). - Enforced
pylinton unit and integration test code (#953). - Enforcing
invalid-namepylint rule (#957). - Fixed AzureResourcePermissions.load to call Installation.load (#962).
- Fixed installer script to reuse an existing UCX Cluster policy if present (#964).
- More
pylinttuning (#958). - Refactor
workspace_client_mockto have combine fixtures stored in separate JSON files (#955).
Dependency updates:
- Updated databricks-sdk requirement from ~=0.19.0 to ~=0.20.0 (#961).
Contributors: @nfx, @dependabot[bot], @qziyuan, @HariGS-DB, @FastLee, @nkvuong
v0.13.0
- Added CLI Command
databricks labs ucx principal-prefix-access(#949). - Added a widget with all jobs to track migration progress (#940).
- Added legacy cluster types to the assessment result (#932).
- Cleanup of install documentation (#951, #947).
- Fixed
WorkspaceConfiginitialization forDEBUGnotebook (#934). - Fixed installer not opening config file during the installation (#945).
- Fixed groups in config file not considered for group migration job (#943).
- Fixed bug where
tenant_idinside secret scope is not detected (#942).
Contributors: @HariGS-DB, @william-conti, @nkvuong, @nfx, @prajin-29, @FastLee, @larsgeorge-db
v0.12.0
- Added CLI Command
databricks labs ucx save-uc-compatible-roles(#863). - Added dashboard widget with table count by storage and format (#852).
- Added verification of group permissions (#841).
- Checking pipeline cluster config and cluster policy in 'crawl_pipelines' task (#864).
- Created cluster policy (ucx-policy) to be used by all UCX compute. This may require customers to reinstall UCX. (#853).
- Skip scanning objects that were removed on platform side since the last scan time, so that integration tests are less flaky (#922).
- Updated assessment documentation (#873).
Dependency updates:
- Updated databricks-sdk requirement from ~=0.18.0 to ~=0.19.0 (#930).
Contributors: @FastLee, @dipankarkush-db, @larsgeorge-db, @prajin-29, @HariGS-DB, @dependabot[bot], @andrascsillag-db, @nkvuong
v0.11.1
- Added "what" property for migration to scope down table migrations (#856).
- Added job count in the assessment dashboard (#858).
- Adopted
installationpackage fromdatabricks-labs-blueprint(#860). - Debug logs to print only the first 96 bytes of SQL query by default, tunable by
debug_truncate_bytesSDK configuration property (#859). - Extract command codes and unify the checks for spark_conf, cluster_policy, init_scripts (#855).
- Improved installation failure with actionable message (#840).
- Improved validating groups membership cli command (#816).
Dependency updates:
- Updated databricks-labs-blueprint requirement from ~=0.1.0 to ~=0.2.4 (#867).
Contributors: @prajin-29, @nfx, @FastLee, @dependabot[bot], @qziyuan, @mwojtyczka