Release v0.34.0 · databrickslabs/ucx

Added a check for No isolation shared clusters and MLR (#2484). This commit introduces a check for No isolation shared clusters utilizing MLR as part of the assessment workflow and cluster crawler, addressing issue #846. A new function, is_mlr, has been implemented to determine if the Spark version corresponds to an MLR cluster. If the cluster has no isolation and uses MLR, the assessment failure list is appended with an appropriate error message. Thorough testing, including unit tests and manual verification, has been conducted. However, user documentation and new CLI commands, workflows, tables, or unit/integration tests have not been added. Additionally, a new test has been added to verify the behavior of MLR clusters without isolation, enhancing the assessment workflow's accuracy in identifying unsupported configurations.
Added a section in migration dashboard to list the failed tables, etc (#2406). In this release, we have introduced a new logging message format for failed table migrations in the TableMigrate class, specifically impacting the _migrate_external_table, _migrate_external_table_hiveserde_in_place, _migrate_dbfs_root_table, _migrate_table_create_ctas, _migrate_table_in_mount, and _migrate_acl methods within the table_migrate.py file. This update employs the failed-to-migrate prefix in log messages for improved failure reason identification during table migrations, enhancing debugging capabilities. As part of this release, we have also developed a new SQL file, 05_1_failed_table_migration.sql, which retrieves a list of failed table migrations by extracting messages with the 'failed-to-migrate:' prefix from the inventory.logs table and returning the corresponding message text. While this release does not include new methods or user documentation, it resolves issue #1754 and has been manually tested with positive results in the staging environment, demonstrating its functionality.
Added clean up activities when migrate-credentials cmd fails intermittently (#2479). This pull request enhances the robustness of the migrate-credentials command for Azure in the event of intermittent failures during the creation of access connectors and storage credentials. It introduces new methods, delete_storage_credential and delete_access_connectors, which are responsible for removing incomplete resources when errors occur. The _migrate_service_principals and _create_storage_credentials_for_storage_accounts methods now handle PermissionDenied, NotFound, and BadRequest exceptions, deleting created storage credentials and access connectors if exceptions occur. Additionally, error messages have been updated to guide users in resolving issues before attempting the operation again. The PR also modifies the sp_migration fixture in the tests/unit/azure/test_credentials.py file, simplifying the deletion process for access connectors and improving the testing of the ServicePrincipalMigration class. These changes address issue #2362, ensuring clean-up activities in case of intermittent failures and improving the overall reliability of the system.
Added standalone migrate ACLs (#2284). A new migrate-acls command has been introduced to facilitate the migration of Access Control Lists (ACLs) from a legacy metastore to a Unity Catalog (UC) metastore. The command, designed to work with HMS federation and other table migration scenarios, can be executed with optional flags target-catalog and hms-fed to specify the target catalog and migrate HMS-FED ACLs, respectively. The release also includes modifications to the labs.yml file, adding the new command and its details to the commands section. In addition, a new ACLMigrator class has been added to the databricks.labs.ucx.contexts.application module to handle ACL migration for tables in a standalone manner. A new test file, test_migrate_acls.py, contains unit tests for ACL migration in a Hive metastore, covering various scenarios and ensuring proper query generation. These features streamline and improve the functionality of ACL migration, offering better access control management for users.
Appends metastore_id or location_name to roles for uniqueness (#2471). A new method, _generate_role_name, has been added to the Access class in the aws/access.py file of the databricks/labs/ucx module to generate unique names for AWS roles using a consistent naming convention. The list_uc_roles method has been updated to utilize this new method for creating role names. In response to issue #2336, the create_missing_principals change enforces role uniqueness on AWS by modifying the ExternalLocation table to include metastore_id or location_name for uniqueness. To ensure proper cleanup, the create_uber_principal method has been updated to delete the instance profile if creating the cluster policy fails due to a PermissionError. Unit tests have been added to verify these changes, including tests for the new role name generation method and the updated ExternalLocation table. The MetastoreAssignment class is also imported in this diff, although its usage is not immediately clear. These changes aim to improve the creation of unique AWS roles for Databricks Labs UCX and enforce role uniqueness on AWS.
Cache workspace content (#2497). In this release, we have implemented a caching mechanism for workspace content to improve load times and bypass rate limits. The WorkspaceCache class handles caching of workspace content, with the _CachedIO and _PathLruCache classes managing IO operation caching and LRU caching, respectively. The _CachedPath class, a subclass of WorkspacePath, handles caching of workspace paths. The open and unlink methods of _CachedPath have been overridden to cache results and remove corresponding cache entries. The guess_encoding function is used to determine the encoding of downloaded content. Unit tests have been added to ensure the proper functioning of the caching mechanism, including tests for cache reuse, invalidation, and encoding determination. This feature aims to enhance the performance of file operations, making the overall system more efficient for users.
Changes the security mode for assessment cluster (#2472). In this release, the security mode of the main cluster assessment has been updated from LEGACY_SINGLE_USER to LEGACY_SINGLE_USER_STANDARD in the workflows.py file. This change disables passthrough and addresses issue #1717. The new data security mode is defined in the compute.ClusterSpec object for the main job cluster by modifying the data_security_mode attribute. While no new methods have been introduced, existing functionality related to the cluster's security mode has been modified. Software engineers adopting this project should be aware of the security implications of this change, ensuring the appropriate data protection measures are in place. Manual testing has been conducted to verify the functionality of this update.
Do not normalize cases when reformatting SQL queries in CI check (#2495). In this release, the CI workflow for pushing changes to the repository has been updated to improve the behavior of the SQL query reformatting step. Previously, case normalization of SQL queries was causing issues with case-sensitive columns, resulting in blocked CI checks. This release addresses the issue by adding the --normalize-case false flag to the databricks labs lsql fmt command, which disables case normalization. This modification allows the CI workflow to pass and ensures correct SQL query formatting, regardless of case sensitivity. The change impacts the assessment/interactive directory, specifically a cluster summary query for interactive assessments. This query involves a change in the ORDER BY clause, replacing a normalized case with the original case. Despite these changes, no new methods have been added, and existing functionality has been modified solely to improve CI efficiency and SQL query compatibility.
Drop source table after successful table move not before (#2430). In this release, we have addressed an issue where the source table was being dropped before a new table was created, which could cause the creation process to fail and leave the source table unavailable. This problem has been resolved by modifying the _recreate_table method of the TableMove class in the hive_metastore package to drop the source table after the new table creation. The updated implementation ensures that the source table remains intact during the creation process, even in case of any issues. This change comes with integration tests and does not involve any modifications to user documentation, CLI commands, workflows, tables, or existing functionality. Additionally, a new test function test_move_tables_table_properties_mismatch_preserves_original has been added to test_table_move.py, which checks if the original table is preserved when there is a mismatch in table properties during the move operation. The changes also include adding the pytest library and the BadRequest exception from the databricks.sdk.errors package for the new test function. The imports section has been updated accordingly with the removal of databricks.sdk.errors.NotFound and the addition of pytest and databricks.sdk.errors.BadRequest.
Enabled principal-prefix-access command to run as collection (#2450). This commit introduces several improvements to the principal-prefix-access command in our open-source library. A new flag run-as-collection has been added, allowing the command to run as a collection across multiple AWS accounts. A new get_workspace_context function has also been implemented, which encapsulates common functionalities and enhances code reusability. Additionally, the get_workspace_contexts method has been developed to retrieve a list of WorkspaceContext objects, making the command more efficient when handling collections of workspaces. Furthermore, the install_on_account method has been updated to use the new get_workspace_contexts method. The principal-prefix-access command has been enhanced to accept an optional acc_client argument, which is used to retrieve information about the assessment run. These changes improve the functionality and organization of the codebase, making it more efficient, flexible, and easier to maintain for users working with multiple AWS accounts and workspaces.
Fixed Driver OOM error by increasing the min memory requirement for node from 16GB to 32 GB (#2473). A modification has been implemented in the policy.py file located in the databricks/labs/ucx/installer directory, which enhances the minimum memory requirement for the node type from 16GB to 32GB. This adjustment is intended to prevent driver out-of-memory (OOM) errors during assessments. The _definition function in the policy class has been updated to incorporate the new memory requirement, which will be employed for selecting a suitable node type. The rest of the code remains unchanged. This modification addresses issue #2398. While the code has been tested, specific testing details are not provided in the commit message.
Fixed issue when running create-missing-credential cmd tries to create the role again if already created (#2456). In this release, we have implemented a fix to address an issue in the _identify_missing_paths function within the access.py file of the databricks/labs/ucx/aws directory, where the create-missing-credential command was attempting to create a role again even if it had already been created. This issue was due to a mismatch in path comparison using the match function, which has now been updated to use the startswith function instead. This change ensures that the code checks if the path starts with the resource path, thereby resolving issue #2413. The _identify_missing_paths function identifies missing paths by loading UC compatible roles and iterating through each external location. If a location matches any of the resource paths of the UC compatible roles, the matching_role variable is set to True, and the code continues to the next role. If the location does not match any of the resource paths, the matching_role variable is set to False. If a match is found, the code continues to the next external location. If no match is found for any of the UC compatible roles, then the location is added to the missing_paths set. The diff also includes a conditional check to return an empty list if the missing_paths set is empty. Additionally, tests have been added or modified to ensure the proper functioning of the updated code, including unit tests and integration tests. However, there is no mention of manual testing or verification on a staging environment. Overall, this update fixes a specific issue with the create-missing-credential command and includes updated tests to ensure proper functionality.
Fixed issue with Interactive Dashboard not showing output (#2476). In this release, we have resolved an issue with the Interactive Dashboard not displaying output by fixing a bug in the query used for the dashboard. Previously, the query was joining on "request_params.clusterid" and selecting "request_params.clusterid" in the SELECT clause, but the correct field name is "request_params.clusterId". The query has been updated to use "request_params.clusterId" instead, both in the JOIN and SELECT clauses. These changes ensure that the Interactive Dashboard displays the correct output, improving the overall functionality and usability of the product. No new methods were added, and existing functionality was changed within the scope of the Interactive Dashboard query. Manual testing is recommended to ensure that the output is now displayed correctly. Additionally, a change has been made to the 'test_installation.py' integration test file to improve the performance of clusters by updating the min_memory_gb argument from 16 GB to 32 GB in the test_job_cluster_policy function.
Fixed support for table/schema scope for the revert table cli command (#2428). In this release, we have enhanced the revert table CLI command to support table and schema scopes in the open-source library. The revert_migrated_tables function now accepts optional parameters schema and table of types str or None, which were previously required parameters. Similarly, the print_revert_report function in the tables_migrator object within WorkspaceContext has been updated to accept the same optional parameters. The revert_migrated_tables function now uses these optional parameters when calling the revert_migrated_tables method of tables_migrator within 'ctx'. Additionally, we have introduced a new dictionary called reverse_seen and modified the _get_tables_to_revert and print_revert_report functions to utilize this dictionary, providing more fine-grained control when reverting table migrations. The delete_managed parameter is used to determine if managed tables should be deleted. These changes allow users to specify a specific schema and table to revert, rather than reverting all migrated tables within a workspace.
Refactor view sequencing and return sequenced views if recursion is found (#2499). In this refactored code, the view sequencing for table migration has been improved and now returns sequenced views if recursion is found, addressing issue #249
Updated databricks-labs-lsql requirement from <0.9,>=0.5 to >=0.5,<0.10 (#2489). In this release, we have updated the version requirements for the databricks-labs-lsql package, changing it from greater than 0.5 and less than 0.9 to greater than 0.5 and less than 0.10. This update enables the use of newer versions of the package while maintaining compatibility with existing systems. The databricks-labs-lsql package is used for creating dashboards and managing SQL queries in Databricks. The pull request also includes detailed release notes, a comprehensive changelog, and a list of commits for the updated package. We recommend that all users of this package review the release notes and update to the new version to take advantage of the latest features and improvements.
Updated databricks-sdk requirement from ~=0.29.0 to >=0.29,<0.31 (#2417). In this pull request, the databricks-sdk dependency has been updated from version ~=0.29.0 to >=0.29,<0.31 to allow for the latest version of the package, which includes new features, bug fixes, internal changes, and other updates. This update is in response to the release of version 0.30.0 of the databricks-sdk library, which includes new features such as DataPlane support and partner support. In addition to the updated dependency, there have been changes to several files, including access.py, fixtures.py, test_access.py, and test_workflows.py. These changes include updates to method calls, import statements, and test data to reflect the new version of the databricks-sdk library. The pyproject.toml file has also been updated to reflect the new dependency version. This pull request does not include any other changes.
Updated sqlglot requirement from <25.12,>=25.5.0 to >=25.5.0,<25.13 (#2431). In this pull request, we are updating the sqlglot dependency from version >=25.5.0,<25.12 to >=25.5.0,<25.13. This update allows us to use the latest version of the sqlglot library, which includes several new features and bug fixes. Specifically, the new version includes support for TryCast generation and improvements to the clickhouse dialect. It is important to note that the previous version had a breaking change related to treating DATABASE as SCHEMA in exp.Create. Therefore, it is crucial to thoroughly test the changes before merging, as breaking changes may affect existing functionality.
Updated sqlglot requirement from <25.13,>=25.5.0 to >=25.5.0,<25.15 (#2453). In this pull request, we have updated the required version range of the sqlglot package from >=25.5.0,<25.13 to >=25.5.0,<25.15. This change allows us to install the latest version of the package, which includes several bug fixes and new features. These include improved transpilation of nullable/non-nullable data types and support for TryCast generation in ClickHouse. The changelog for sqlglot provides a detailed list of changes in each release, and a list of commits made in the latest release is also included in the pull request. This update will improve the functionality and reliability of our software, as we will now be able to take advantage of the latest features and fixes provided by sqlglot.
Updated sqlglot requirement from <25.15,>=25.5.0 to >=25.5.0,<25.17 (#2480). In this release, we have updated the requirement range for the sqlglot dependency to '>=25.5.0,<25.17' from '<25.15,>=25.5.0'. This change resolves issues #2452 and #2451 and includes several bug fixes and new features in the sqlglot library version 25.16.1. The updated version includes support for timezone in exp.TimeStrToTime, transpiling from_iso8601_timestamp from presto/trino to duckdb, and mapping %e to %-d in BigQuery. Additionally, there are changes to the parser and optimizer, as well as other bug fixes and refactors. This update does not introduce any major breaking changes and should not affect the functionality of the project. The sqlglot library is used for parsing, analyzing, and rewriting SQL queries, and the new version range provides improved functionality and reliability.
Updated sqlglot requirement from <25.17,>=25.5.0 to >=25.5.0,<25.18 (#2488). this pull request updates the sqlglot library requirement to version 25.5.0 or greater, but less than 25.18. By doing so, it enables the use of the latest version of sqlglot, while still maintaining compatibility with the current implementation. The changelog and commits for each release from v25.17.0 to v25.16.1 are provided for reference, detailing bug fixes, new features, and breaking changes. As a software engineer, it's important to review this pull request and ensure it aligns with the project's requirements before merging, to take advantage of the latest improvements and fixes in sqlglot.
Updated sqlglot requirement from <25.18,>=25.5.0 to >=25.5.0,<25.19 (#2509). In this release, we have updated the required version of the sqlglot package in our project's dependencies. Previously, we required a version greater than or equal to 25.5.0 and less than 25.18, which has now been updated to require a version greater than or equal to 25.5.0 and less than 25.19. This change was made automatically by Dependabot, a service that helps to keep dependencies up to date, in order to permit the latest version of the sqlglot package. The pull request contains a detailed list of the changes made in the sqlglot package between versions 25.5.0 and 25.18.0, as well as a list of the commits that were made during this time. These details can be helpful for understanding the potential impact of the update on the project.
[chore] make GRANT migration logic isolated to MigrateGrants component (#2492). In this release, the grant migration logic has been isolated to a separate MigrateGrants component, enhancing code modularity and maintainability. This new component, along with the ACLMigrator, is now responsible for handling grants and Access Control Lists (ACLs) migration. The MigrateGrants class takes grant loaders as input, applies grants to a Unity Catalog (UC) table based on a given source table, and is utilized in the acl_migrator method. The ACLMigrator class manages ACL migration for the migrated tables, taking instances of necessary classes as arguments and setting ACLs for the migrated tables based on the migration status. These changes bring better separation of concerns, making the code easier to understand, test, and maintain.

Dependency updates:

Updated databricks-sdk requirement from ~=0.29.0 to >=0.29,<0.31 (#2417).
Updated sqlglot requirement from <25.12,>=25.5.0 to >=25.5.0,<25.13 (#2431).
Updated sqlglot requirement from <25.13,>=25.5.0 to >=25.5.0,<25.15 (#2453).
Updated sqlglot requirement from <25.15,>=25.5.0 to >=25.5.0,<25.17 (#2480).
Updated databricks-labs-lsql requirement from <0.9,>=0.5 to >=0.5,<0.10 (#2489).
Updated sqlglot requirement from <25.17,>=25.5.0 to >=25.5.0,<25.18 (#2488).
Updated sqlglot requirement from <25.18,>=25.5.0 to >=25.5.0,<25.19 (#2509).

Contributors: @dependabot[bot], @JCZuurmond, @HariGS-DB, @nfx, @ericvergnaud, @FastLee, @pritishpai, @gubyb, @aminmovahed-db

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.34.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Contributors

Uh oh!