v0.34.0
- Added a check for No isolation shared clusters and MLR (#2484). This commit introduces a check for
No isolation shared clustersutilizing MLR as part of the assessment workflow and cluster crawler, addressing issue #846. A new function,is_mlr, has been implemented to determine if the Spark version corresponds to an MLR cluster. If the cluster has no isolation and uses MLR, the assessment failure list is appended with an appropriate error message. Thorough testing, including unit tests and manual verification, has been conducted. However, user documentation and new CLI commands, workflows, tables, or unit/integration tests have not been added. Additionally, a new test has been added to verify the behavior of MLR clusters without isolation, enhancing the assessment workflow's accuracy in identifying unsupported configurations. - Added a section in migration dashboard to list the failed tables, etc (#2406). In this release, we have introduced a new logging message format for failed table migrations in the
TableMigrateclass, specifically impacting the_migrate_external_table,_migrate_external_table_hiveserde_in_place,_migrate_dbfs_root_table,_migrate_table_create_ctas,_migrate_table_in_mount, and_migrate_aclmethods within thetable_migrate.pyfile. This update employs thefailed-to-migrateprefix in log messages for improved failure reason identification during table migrations, enhancing debugging capabilities. As part of this release, we have also developed a new SQL file,05_1_failed_table_migration.sql, which retrieves a list of failed table migrations by extracting messages with the 'failed-to-migrate:' prefix from the inventory.logs table and returning the corresponding message text. While this release does not include new methods or user documentation, it resolves issue #1754 and has been manually tested with positive results in the staging environment, demonstrating its functionality. - Added clean up activities when
migrate-credentialscmd fails intermittently (#2479). This pull request enhances the robustness of themigrate-credentialscommand for Azure in the event of intermittent failures during the creation of access connectors and storage credentials. It introduces new methods,delete_storage_credentialanddelete_access_connectors, which are responsible for removing incomplete resources when errors occur. The_migrate_service_principalsand_create_storage_credentials_for_storage_accountsmethods now handlePermissionDenied,NotFound, andBadRequestexceptions, deleting created storage credentials and access connectors if exceptions occur. Additionally, error messages have been updated to guide users in resolving issues before attempting the operation again. The PR also modifies thesp_migrationfixture in thetests/unit/azure/test_credentials.pyfile, simplifying the deletion process for access connectors and improving the testing of theServicePrincipalMigrationclass. These changes address issue #2362, ensuring clean-up activities in case of intermittent failures and improving the overall reliability of the system. - Added standalone migrate ACLs (#2284). A new
migrate-aclscommand has been introduced to facilitate the migration of Access Control Lists (ACLs) from a legacy metastore to a Unity Catalog (UC) metastore. The command, designed to work with HMS federation and other table migration scenarios, can be executed with optional flagstarget-catalogandhms-fedto specify the target catalog and migrate HMS-FED ACLs, respectively. The release also includes modifications to thelabs.ymlfile, adding the new command and its details to thecommandssection. In addition, a newACLMigratorclass has been added to thedatabricks.labs.ucx.contexts.applicationmodule to handle ACL migration for tables in a standalone manner. A new test file,test_migrate_acls.py, contains unit tests for ACL migration in a Hive metastore, covering various scenarios and ensuring proper query generation. These features streamline and improve the functionality of ACL migration, offering better access control management for users. - Appends metastore_id or location_name to roles for uniqueness (#2471). A new method,
_generate_role_name, has been added to theAccessclass in theaws/access.pyfile of thedatabricks/labs/ucxmodule to generate unique names for AWS roles using a consistent naming convention. Thelist_uc_rolesmethod has been updated to utilize this new method for creating role names. In response to issue #2336, thecreate_missing_principalschange enforces role uniqueness on AWS by modifying theExternalLocationtable to includemetastore_idorlocation_namefor uniqueness. To ensure proper cleanup, thecreate_uber_principalmethod has been updated to delete the instance profile if creating the cluster policy fails due to aPermissionError. Unit tests have been added to verify these changes, including tests for the new role name generation method and the updatedExternalLocationtable. TheMetastoreAssignmentclass is also imported in this diff, although its usage is not immediately clear. These changes aim to improve the creation of unique AWS roles for Databricks Labs UCX and enforce role uniqueness on AWS. - Cache workspace content (#2497). In this release, we have implemented a caching mechanism for workspace content to improve load times and bypass rate limits. The
WorkspaceCacheclass handles caching of workspace content, with the_CachedIOand_PathLruCacheclasses managing IO operation caching and LRU caching, respectively. The_CachedPathclass, a subclass ofWorkspacePath, handles caching of workspace paths. Theopenandunlinkmethods of_CachedPathhave been overridden to cache results and remove corresponding cache entries. Theguess_encodingfunction is used to determine the encoding of downloaded content. Unit tests have been added to ensure the proper functioning of the caching mechanism, including tests for cache reuse, invalidation, and encoding determination. This feature aims to enhance the performance of file operations, making the overall system more efficient for users. - Changes the security mode for assessment cluster (#2472). In this release, the security mode of the
maincluster assessment has been updated from LEGACY_SINGLE_USER to LEGACY_SINGLE_USER_STANDARD in the workflows.py file. This change disables passthrough and addresses issue #1717. The new data security mode is defined in the compute.ClusterSpec object for themainjob cluster by modifying the data_security_mode attribute. While no new methods have been introduced, existing functionality related to the cluster's security mode has been modified. Software engineers adopting this project should be aware of the security implications of this change, ensuring the appropriate data protection measures are in place. Manual testing has been conducted to verify the functionality of this update. - Do not normalize cases when reformatting SQL queries in CI check (#2495). In this release, the CI workflow for pushing changes to the repository has been updated to improve the behavior of the SQL query reformatting step. Previously, case normalization of SQL queries was causing issues with case-sensitive columns, resulting in blocked CI checks. This release addresses the issue by adding the
--normalize-case falseflag to thedatabricks labs lsql fmtcommand, which disables case normalization. This modification allows the CI workflow to pass and ensures correct SQL query formatting, regardless of case sensitivity. The change impacts the assessment/interactive directory, specifically a cluster summary query for interactive assessments. This query involves a change in the ORDER BY clause, replacing a normalized case with the original case. Despite these changes, no new methods have been added, and existing functionality has been modified solely to improve CI efficiency and SQL query compatibility. - Drop source table after successful table move not before (#2430). In this release, we have addressed an issue where the source table was being dropped before a new table was created, which could cause the creation process to fail and leave the source table unavailable. This problem has been resolved by modifying the
_recreate_tablemethod of theTableMoveclass in thehive_metastorepackage to drop the source table after the new table creation. The updated implementation ensures that the source table remains intact during the creation process, even in case of any issues. This change comes with integration tests and does not involve any modifications to user documentation, CLI commands, workflows, tables, or existing functionality. Additionally, a new test functiontest_move_tables_table_properties_mismatch_preserves_originalhas been added totest_table_move.py, which checks if the original table is preserved when there is a mismatch in table properties during the move operation. The changes also include adding thepytestlibrary and theBadRequestexception from thedatabricks.sdk.errorspackage for the new test function. The imports section has been updated accordingly with the removal ofdatabricks.sdk.errors.NotFoundand the addition ofpytestanddatabricks.sdk.errors.BadRequest. - Enabled
principal-prefix-accesscommand to run as collection (#2450). This commit introduces several improvements to theprincipal-prefix-accesscommand in our open-source library. A new flagrun-as-collectionhas been added, allowing the command to run as a collection across multiple AWS accounts. A newget_workspace_contextfunction has also been implemented, which encapsulates common functionalities and enhances code reusability. Additionally, theget_workspace_contextsmethod has been developed to retrieve a list ofWorkspaceContextobjects, making the command more efficient when handling collections of workspaces. Furthermore, theinstall_on_accountmethod has been updated to use the newget_workspace_contextsmethod. Theprincipal-prefix-accesscommand has been enhanced to accept an optionalacc_clientargument, which is used to retrieve information about the assessment run. These changes improve the functionality and organization of the codebase, making it more efficient, flexible, and easier to maintain for users working with multiple AWS accounts and workspaces. - Fixed Driver OOM error by increasing the min memory requirement for node from 16GB to 32 GB (#2473). A modification has been implemented in the
policy.pyfile located in thedatabricks/labs/ucx/installerdirectory, which enhances the minimum memory requirement for the node type from 16GB to 32GB. This adjustment is intended to prevent driver out-of-memory (OOM) errors during assessments. The_definitionfunction in thepolicyclass has been updated to incorporate the new memory requirement, which will be employed for selecting a suitable node type. The rest of the code remains unchanged. This modification addresses issue #2398. While the code has been tested, specific testing details are not provided in the commit message. - Fixed issue when running create-missing-credential cmd tries to create the role again if already created (#2456). In this release, we have implemented a fix to address an issue in the
_identify_missing_pathsfunction within theaccess.pyfile of thedatabricks/labs/ucx/awsdirectory, where thecreate-missing-credentialcommand was attempting to create a role again even if it had already been created. This issue was due to a mismatch in path comparison using thematchfunction, which has now been updated to use thestartswithfunction instead. This change ensures that the code checks if the path starts with the resource path, thereby resolving issue #2413. The_identify_missing_pathsfunction identifies missing paths by loading UC compatible roles and iterating through each external location. If a location matches any of the resource paths of the UC compatible roles, thematching_rolevariable is set to True, and the code continues to the next role. If the location does not match any of the resource paths, thematching_rolevariable is set to False. If a match is found, the code continues to the next external location. If no match is found for any of the UC compatible roles, then the location is added to themissing_pathsset. The diff also includes a conditional check to return an empty list if themissing_pathsset is empty. Additionally, tests have been added or modified to ensure the proper functioning of the updated code, including unit tests and integration tests. However, there is no mention of manual testing or verification on a staging environment. Overall, this update fixes a specific issue with thecreate-missing-credentialcommand and includes updated tests to ensure proper functionality. - Fixed issue with Interactive Dashboard not showing output (#2476). In this release, we have resolved an issue with the Interactive Dashboard not displaying output by fixing a bug in the query used for the dashboard. Previously, the query was joining on "request_params.clusterid" and selecting "request_params.clusterid" in the SELECT clause, but the correct field name is "request_params.clusterId". The query has been updated to use "request_params.clusterId" instead, both in the JOIN and SELECT clauses. These changes ensure that the Interactive Dashboard displays the correct output, improving the overall functionality and usability of the product. No new methods were added, and existing functionality was changed within the scope of the Interactive Dashboard query. Manual testing is recommended to ensure that the output is now displayed correctly. Additionally, a change has been made to the 'test_installation.py' integration test file to improve the performance of clusters by updating the
min_memory_gbargument from 16 GB to 32 GB in thetest_job_cluster_policyfunction. - Fixed support for table/schema scope for the revert table cli command (#2428). In this release, we have enhanced the
revert tableCLI command to support table and schema scopes in the open-source library. Therevert_migrated_tablesfunction now accepts optional parametersschemaandtableof types str or None, which were previously required parameters. Similarly, theprint_revert_reportfunction in thetables_migratorobject withinWorkspaceContexthas been updated to accept the same optional parameters. Therevert_migrated_tablesfunction now uses these optional parameters when calling therevert_migrated_tablesmethod oftables_migratorwithin 'ctx'. Additionally, we have introduced a new dictionary calledreverse_seenand modified the_get_tables_to_revertandprint_revert_reportfunctions to utilize this dictionary, providing more fine-grained control when reverting table migrations. Thedelete_managedparameter is used to determine if managed tables should be deleted. These changes allow users to specify a specific schema and table to revert, rather than reverting all migrated tables within a workspace. - Refactor view sequencing and return sequenced views if recursion is found (#2499). In this refactored code, the view sequencing for table migration has been improved and now returns sequenced views if recursion is found, addressing issue #249
- Updated databricks-labs-lsql requirement from <0.9,>=0.5 to >=0.5,<0.10 (#2489). In this release, we have updated the version requirements for the
databricks-labs-lsqlpackage, changing it from greater than 0.5 and less than 0.9 to greater than 0.5 and less than 0.10. This update enables the use of newer versions of the package while maintaining compatibility with existing systems. Thedatabricks-labs-lsqlpackage is used for creating dashboards and managing SQL queries in Databricks. The pull request also includes detailed release notes, a comprehensive changelog, and a list of commits for the updated package. We recommend that all users of this package review the release notes and update to the new version to take advantage of the latest features and improvements. - Updated databricks-sdk requirement from ~=0.29.0 to >=0.29,<0.31 (#2417). In this pull request, the
databricks-sdkdependency has been updated from version~=0.29.0to>=0.29,<0.31to allow for the latest version of the package, which includes new features, bug fixes, internal changes, and other updates. This update is in response to the release of version0.30.0of thedatabricks-sdklibrary, which includes new features such as DataPlane support and partner support. In addition to the updated dependency, there have been changes to several files, includingaccess.py,fixtures.py,test_access.py, andtest_workflows.py. These changes include updates to method calls, import statements, and test data to reflect the new version of thedatabricks-sdklibrary. Thepyproject.tomlfile has also been updated to reflect the new dependency version. This pull request does not include any other changes. - Updated sqlglot requirement from <25.12,>=25.5.0 to >=25.5.0,<25.13 (#2431). In this pull request, we are updating the
sqlglotdependency from version>=25.5.0,<25.12to>=25.5.0,<25.13. This update allows us to use the latest version of thesqlglotlibrary, which includes several new features and bug fixes. Specifically, the new version includes support forTryCastgeneration and improvements to theclickhousedialect. It is important to note that the previous version had a breaking change related to treatingDATABASEasSCHEMAinexp.Create. Therefore, it is crucial to thoroughly test the changes before merging, as breaking changes may affect existing functionality. - Updated sqlglot requirement from <25.13,>=25.5.0 to >=25.5.0,<25.15 (#2453). In this pull request, we have updated the required version range of the
sqlglotpackage from>=25.5.0,<25.13to>=25.5.0,<25.15. This change allows us to install the latest version of the package, which includes several bug fixes and new features. These include improved transpilation of nullable/non-nullable data types and support for TryCast generation in ClickHouse. The changelog forsqlglotprovides a detailed list of changes in each release, and a list of commits made in the latest release is also included in the pull request. This update will improve the functionality and reliability of our software, as we will now be able to take advantage of the latest features and fixes provided bysqlglot. - Updated sqlglot requirement from <25.15,>=25.5.0 to >=25.5.0,<25.17 (#2480). In this release, we have updated the requirement range for the
sqlglotdependency to '>=25.5.0,<25.17' from '<25.15,>=25.5.0'. This change resolves issues #2452 and #2451 and includes several bug fixes and new features in thesqlglotlibrary version 25.16.1. The updated version includes support for timezone in exp.TimeStrToTime, transpiling from_iso8601_timestamp from presto/trino to duckdb, and mapping %e to %-d in BigQuery. Additionally, there are changes to the parser and optimizer, as well as other bug fixes and refactors. This update does not introduce any major breaking changes and should not affect the functionality of the project. Thesqlglotlibrary is used for parsing, analyzing, and rewriting SQL queries, and the new version range provides improved functionality and reliability. - Updated sqlglot requirement from <25.17,>=25.5.0 to >=25.5.0,<25.18 (#2488). this pull request updates the sqlglot library requirement to version 25.5.0 or greater, but less than 25.18. By doing so, it enables the use of the latest version of sqlglot, while still maintaining compatibility with the current implementation. The changelog and commits for each release from v25.17.0 to v25.16.1 are provided for reference, detailing bug fixes, new features, and breaking changes. As a software engineer, it's important to review this pull request and ensure it aligns with the project's requirements before merging, to take advantage of the latest improvements and fixes in sqlglot.
- Updated sqlglot requirement from <25.18,>=25.5.0 to >=25.5.0,<25.19 (#2509). In this release, we have updated the required version of the
sqlglotpackage in our project's dependencies. Previously, we required a version greater than or equal to 25.5.0 and less than 25.18, which has now been updated to require a version greater than or equal to 25.5.0 and less than 25.19. This change was made automatically by Dependabot, a service that helps to keep dependencies up to date, in order to permit the latest version of thesqlglotpackage. The pull request contains a detailed list of the changes made in thesqlglotpackage between versions 25.5.0 and 25.18.0, as well as a list of the commits that were made during this time. These details can be helpful for understanding the potential impact of the update on the project. - [chore] make
GRANTmigration logic isolated toMigrateGrantscomponent (#2492). In this release, the grant migration logic has been isolated to a separateMigrateGrantscomponent, enhancing code modularity and maintainability. This new component, along with theACLMigrator, is now responsible for handling grants and Access Control Lists (ACLs) migration. TheMigrateGrantsclass takes grant loaders as input, applies grants to a Unity Catalog (UC) table based on a given source table, and is utilized in theacl_migratormethod. TheACLMigratorclass manages ACL migration for the migrated tables, taking instances of necessary classes as arguments and setting ACLs for the migrated tables based on the migration status. These changes bring better separation of concerns, making the code easier to understand, test, and maintain.
Dependency updates:
- Updated databricks-sdk requirement from ~=0.29.0 to >=0.29,<0.31 (#2417).
- Updated sqlglot requirement from <25.12,>=25.5.0 to >=25.5.0,<25.13 (#2431).
- Updated sqlglot requirement from <25.13,>=25.5.0 to >=25.5.0,<25.15 (#2453).
- Updated sqlglot requirement from <25.15,>=25.5.0 to >=25.5.0,<25.17 (#2480).
- Updated databricks-labs-lsql requirement from <0.9,>=0.5 to >=0.5,<0.10 (#2489).
- Updated sqlglot requirement from <25.17,>=25.5.0 to >=25.5.0,<25.18 (#2488).
- Updated sqlglot requirement from <25.18,>=25.5.0 to >=25.5.0,<25.19 (#2509).
Contributors: @dependabot[bot], @JCZuurmond, @HariGS-DB, @nfx, @ericvergnaud, @FastLee, @pritishpai, @gubyb, @aminmovahed-db