v0.36.0
- Added
uploadanddownloadcli commands touploadanddownloada file to/from a collection of workspaces (#2508). In this release, the Databricks Labs Unified CLI (Command Line Interface) for UCX (Unified CLI for Workspaces, Clusters, and Tables) has been updated with newuploadanddownloadcommands. Theuploadcommand allows users to upload a file to a single workspace or a collection of workspaces, while thedownloadcommand enables users to download a CSV file from a single workspace or a collection of workspaces. This enhances the efficiency of uploading or downloading the same file to multiple workspaces. Both commands display a warning or information message upon completion, and ensure the file schema is correct before uploading CSV files. This feature includes new methods for uploading and downloading files for multiple workspaces, as well as new unit and integration tests. Users can refer to the contributing instructions to help improve the project. - Added ability to run
create-table-mappingcommand as collection (#2602). This PR introduces the capability to run thecreate-table-mappingcommand as a collection in thedatabricks labs ucxCLI, providing increased flexibility and automation for workflows. A new optional boolean flag,run-as-collection, has been added to thecreate-table-mappingcommand, allowing users to indicate if they want to run it as a collection with a default value of False. The updatedcreate_table_mappingfunction now accepts additional arguments, enabling efficient creation of table mappings for multiple workspaces. Users are encouraged to test this feature in various scenarios and provide feedback for further improvements. - Added comment on the source tables to capture that they have been deprecated (#2548). A new method,
_sql_add_migrated_comment(self, table: Table, target_table_key: str), has been added to thetable_migrate.pyfile to mark deprecated source tables with a comment indicating their deprecated status and directing users to the new table. This method is currently being used in three existing methods within the same file to add comments to deprecated tables as part of the migration process. In addition, a new SQL query has been added to set a comment on the source tablehive_metastore.db1_src.managed_dbfs, indicating that it is deprecated and directing users to the new tableucx_default.db1_dst.managed_dbfs. A unit test has also been updated to ensure that the migration process correctly adds the deprecation comment to the source table. This change is part of a larger effort to deprecate and migrate data from old tables to new tables and provides guidance for users to migrate to the new table. - Added documentation for PrincipalACl migration and delete-missing-principal cmd (#2552). In this open-source library release, the UCX project has added a new command
delete-missing-principals, applicable only for AWS, to delete IAM roles created by UCX. This command lists all IAM roles generated by theprincipal-prefix-accesscommand and allows for the selection of multiple roles to delete. It checks if the selected roles are mapped to any storage credentials and seeks confirmation before deleting the role and its associated inline policy. Additionally, updates have been made to thecreate-uber-principalandmigrate-locationscommands to apply location ACLs from existing clusters and grant necessary permissions to users. Thecreate-catalogs-schemascommand has been updated to apply catalog and schema ACLs from existing clusters for both Azure and AWS. Themigrate-tablescommand has also been updated to apply table and view ACLs from existing clusters for both Azure and AWS. The documentation of commands that require admin privileges in the UCX project has also been updated. - Added linting for
spark.sql(...)calls (#2558). This commit introduces linting forspark.sql(...)calls to enhance code quality and consistency by addressing issue #2558. The previous SparkSqlPyLinter linter only checked for table migration, but not other SQL linters like DirectFsAccess linters. This has been rectified by incorporating additional SQL linters forspark.sql(...)calls, improving the overall linting functionality of the system. The commit also introduces an abstract base class called Fixer, which enforces the inclusion of anameproperty for all derived classes. Additionally, minor improvements and changes have been made to the codebase. The commit resolves issue #2551, and updates the testing process intest_functional.pyto testspark-sql-directfs.py, ensuring the proper functioning of the lintedspark.sql(...)calls. - Document: clarify that the
assessmentjob is not intended to be re-run (#2560). In this release, we have updated the behavior of theassessmentjob for Databricks Labs Unity Catalog (UCX) to address confusion around its re-run functionality. Moving forward, theassessmentjob should only be executed once during the initial setup of UCX and should not be re-run to refresh the inventory or findings. If a re-assessment is necessary, UCX will need to be reinstalled first. This change aligns the actual functionality of theassessmentjob and will not affect the daily job that updates parts of the inventory. Theassessmentworkflow is designed to detect incompatible entities and provide information for the migration process. It can be executed in parallel or sequentially, and its output is stored in Delta tables for further analysis and decision-making through the assessment report. - Enabled
migrate-credentialscommand to run as collection (#2532). In this pull request, themigrate-credentialscommand in the UCX project's CLI has been updated with a new optional flag,run_as_collection, which allows the command to operate on multiple workspaces as a collection. This change introduces theget_contextsfunction and modifies thedelete_missing_principalsfunction to support the new functionality. Themigrate-credentialscommand's behavior for Azure and AWS has been updated to accept an additionalacc_clientargument in its tests. Comprehensive tests and documentation have been added to ensure the reliability and robustness of the new functionality. It is recommended to review the attached testing evidence and ensure the new functionality works as intended without introducing any unintended side effects. - Escape column names in target tables of the table migration (#2563). In this release, the
escape_sql_identifierfunction in theutils.pyfile has been enhanced with a newmaxsplitparameter, providing more control over the maximum number of splits performed on the input string. This addresses issue #2544 and is part of the existing workflow "-migration-ones". The "tables.py" file in the "databricks/labs/ucx/hive_metastore" directory has been updated to escape column names in target tables, preventing SQL injection attacks. Additionally, a newColumnInfoclass and several utility functions have been added to thefixtures.pyfile in thedatabricks.labs.ucxproject for generating SQL schemas and column casting. The integration tests for migrating Hive Metastore tables have been updated with new tests to handle column names that require escaping. Lastly, thetest_manager.pyfile in thetests/unit/workspace_accessdirectory has been refactored by removing themock_backendfixture and adding thetest_inventory_permission_manager_initmethod to test the initialization of thePermissionManagerclass. These changes improve security, functionality, and test coverage for software engineers utilizing these libraries in their projects. - Explain why metastore is checked to exists in group migration workflow in docstring (#2614). In the updated
workflows.pyfile, the docstring for theverify_metastore_attachedmethod has been revised to explain the necessity of checking if a metastore is attached to the workspace. The reason for this check is that account level groups are only available when a metastore is attached, which is crucial for the group migration workflow to function properly. The method itself remains the same, only verifying the presence of a metastore attached to the workspace and causing the workflow to fail if no metastore is found. This modification enhances the clarity of the metastore check's importance in the context of the group migration workflow. - Fixed infinite recursion when visiting a dependency graph (#2562). This change addresses an issue of infinite recursion that can occur when visiting a dependency graph, particularly when many files in a package import the package itself. The
visitmethod has been modified to only visit each parent/child pair once, preventing the recursion that can occur in such cases. Thedependenciesproperty has been added to the DependencyGraph class, and theDependencyGraphVisitorclass has been introduced to handle visiting nodes and tracking visited pairs. These modifications improve the robustness of the library by preventing infinite recursion during dependency resolution. The change includes added unit tests to ensure correct behavior and addresses a blocker for a previous pull request. The functionality of the code remains unchanged. - Fixed migrate acls CLI command (#2617). In this release, the
migrate aclscommand in the ucx project's CLI has been updated to address issue #2617. The changes include the removal of ACL type parameters from themigrate ACLcommand, simplifying its usage and eliminating the need for explicit type specifications. Thelegacy_table_aclandprincipalparameters have been removed from themigrate_aclsfunction, while thehms_fedparameter remains unchanged and retains its default value if not explicitly provided. These modifications streamline the ACL migration process in the ucx CLI, making it easier for users to manage access control lists. - Fixes pip install statement in debug notebook (#2545). In this release, we have addressed an issue in the debug notebook where the pip install statement for wheel was incorrectly surrounded by square brackets, causing the notebook run to fail. We have removed the superfluous square brackets and modified the
remote_wheelslist to be joined as a string before being passed to the DEBUG_NOTEBOOK format. It is important to note that this change solely affects the debug notebook and does not involve any alterations to user documentation, CLI commands, workflows, or tables. Furthermore, no new methods have been added, and existing functionality remains unchanged. The change has been manually tested for accuracy, but it does not include any unit tests, integration tests, or staging environment verification. - More escaping of SQL identifiers (#2530). This commit includes updates to SQL identifier escaping, addressing a missed SQL statement in one of the crawlers and adding support for less-known Spark/Databricks corner cases where backticks in names of identifiers need to be doubled when quoting. The
escape_sql_identifierfunction has been modified to consider this new case, and the changes affect the existingmigrate-data-reconciliationworkflow. Additionally, theTableIdentifierclass has been updated to properly escape identifiers, handling the backticks-in-names scenario. These improvements ensure better handling of SQL identifiers, improving the overall functionality of the codebase. Unit tests have been updated to reflect these changes. - Retry deploy workflow on
InternalError(#2525). In the 'workflows.py' file, the_deploy_workflowfunction has been updated to include a retry mechanism using the@retrieddecorator, which handlesInternalErrorexceptions during workflow creation. This enhancement aims to improve the resilience of deploying workflows by automatically retrying in case of internal errors, thereby addressing issue #2522. This change is part of our ongoing efforts to ensure a robust and fault-tolerant deployment process. The retry mechanism is configured with a timeout of 2 minutes to prevent extended waiting in case of persistent issues, thus enhancing overall system efficiency and reliability. - Updated databricks-labs-lsql requirement from <0.10,>=0.5 to >=0.5,<0.11 (#2580). In this release, we have updated the requirement for the databricks-labs-lsql package to version 0.10 or lower, with an upper limit of 0.11. Previously, the package version was constrained to be greater than or equal to 0.5 and less than 0.10. This update will allow users to utilize the latest version of the package, which includes new features and bug fixes. For more detailed information on the changes included in this update, please refer to the changelog and release notes provided in the commit message.
- Updated sqlglot requirement from <25.20,>=25.5.0 to >=25.5.0,<25.21 (#2549). In this pull request, we are updating the sqlglot requirement in the pyproject.toml file from a range of >=25.5.0,<25.20 to >=25.5.0,<25.21. This change allows for the installation of the latest version of sqlglot, while ensuring that the version does not exceed 25.21. The update was made in response to a pull request from Dependabot, which identified a new version of sqlglot. The PR includes details of the sqlglot changelog and commits, but as reviewers, we can focus on the specific change made to our project. The sqlglot package is a SQL parser and transpiler that we use as a dependency in this project. This update will ensure that our project is using the latest version of this package, which may include bug fixes, new features, or improvements in performance.
Dependency updates:
- Updated sqlglot requirement from <25.20,>=25.5.0 to >=25.5.0,<25.21 (#2549).
- Updated databricks-labs-lsql requirement from <0.10,>=0.5 to >=0.5,<0.11 (#2580).
Contributors: @asnare, @JCZuurmond, @ericvergnaud, @pritishpai, @HariGS-DB, @dependabot[bot], @aminmovahed-db, @FastLee