Commit 622ed83

authored

Release v0.34.0 (#2515)

* Added a check for No isolation shared clusters and MLR ([#2484](#2484)). This commit introduces a check for `No isolation shared clusters` utilizing MLR as part of the assessment workflow and cluster crawler, addressing issue [#846](#846). A new function, `is_mlr`, has been implemented to determine if the Spark version corresponds to an MLR cluster. If the cluster has no isolation and uses MLR, the assessment failure list is appended with an appropriate error message. Thorough testing, including unit tests and manual verification, has been conducted. However, user documentation and new CLI commands, workflows, tables, or unit/integration tests have not been added. Additionally, a new test has been added to verify the behavior of MLR clusters without isolation, enhancing the assessment workflow's accuracy in identifying unsupported configurations. * Added a section in migration dashboard to list the failed tables, etc ([#2406](#2406)). In this release, we have introduced a new logging message format for failed table migrations in the `TableMigrate` class, specifically impacting the `_migrate_external_table`, `_migrate_external_table_hiveserde_in_place`, `_migrate_dbfs_root_table`, `_migrate_table_create_ctas`, `_migrate_table_in_mount`, and `_migrate_acl` methods within the `table_migrate.py` file. This update employs the `failed-to-migrate` prefix in log messages for improved failure reason identification during table migrations, enhancing debugging capabilities. As part of this release, we have also developed a new SQL file, `05_1_failed_table_migration.sql`, which retrieves a list of failed table migrations by extracting messages with the 'failed-to-migrate:' prefix from the inventory.logs table and returning the corresponding message text. While this release does not include new methods or user documentation, it resolves issue [#1754](#1754) and has been manually tested with positive results in the staging environment, demonstrating its functionality. * Added clean up activities when `migrate-credentials` cmd fails intermittently ([#2479](#2479)). This pull request enhances the robustness of the `migrate-credentials` command for Azure in the event of intermittent failures during the creation of access connectors and storage credentials. It introduces new methods, `delete_storage_credential` and `delete_access_connectors`, which are responsible for removing incomplete resources when errors occur. The `_migrate_service_principals` and `_create_storage_credentials_for_storage_accounts` methods now handle `PermissionDenied`, `NotFound`, and `BadRequest` exceptions, deleting created storage credentials and access connectors if exceptions occur. Additionally, error messages have been updated to guide users in resolving issues before attempting the operation again. The PR also modifies the `sp_migration` fixture in the `tests/unit/azure/test_credentials.py` file, simplifying the deletion process for access connectors and improving the testing of the `ServicePrincipalMigration` class. These changes address issue [#2362](#2362), ensuring clean-up activities in case of intermittent failures and improving the overall reliability of the system. * Added standalone migrate ACLs ([#2284](#2284)). A new `migrate-acls` command has been introduced to facilitate the migration of Access Control Lists (ACLs) from a legacy metastore to a Unity Catalog (UC) metastore. The command, designed to work with HMS federation and other table migration scenarios, can be executed with optional flags `target-catalog` and `hms-fed` to specify the target catalog and migrate HMS-FED ACLs, respectively. The release also includes modifications to the `labs.yml` file, adding the new command and its details to the `commands` section. In addition, a new `ACLMigrator` class has been added to the `databricks.labs.ucx.contexts.application` module to handle ACL migration for tables in a standalone manner. A new test file, `test_migrate_acls.py`, contains unit tests for ACL migration in a Hive metastore, covering various scenarios and ensuring proper query generation. These features streamline and improve the functionality of ACL migration, offering better access control management for users. * Appends metastore_id or location_name to roles for uniqueness ([#2471](#2471)). A new method, `_generate_role_name`, has been added to the `Access` class in the `aws/access.py` file of the `databricks/labs/ucx` module to generate unique names for AWS roles using a consistent naming convention. The `list_uc_roles` method has been updated to utilize this new method for creating role names. In response to issue [#2336](#2336), the `create_missing_principals` change enforces role uniqueness on AWS by modifying the `ExternalLocation` table to include `metastore_id` or `location_name` for uniqueness. To ensure proper cleanup, the `create_uber_principal` method has been updated to delete the instance profile if creating the cluster policy fails due to a `PermissionError`. Unit tests have been added to verify these changes, including tests for the new role name generation method and the updated `ExternalLocation` table. The `MetastoreAssignment` class is also imported in this diff, although its usage is not immediately clear. These changes aim to improve the creation of unique AWS roles for Databricks Labs UCX and enforce role uniqueness on AWS. * Cache workspace content ([#2497](#2497)). In this release, we have implemented a caching mechanism for workspace content to improve load times and bypass rate limits. The `WorkspaceCache` class handles caching of workspace content, with the `_CachedIO` and `_PathLruCache` classes managing IO operation caching and LRU caching, respectively. The `_CachedPath` class, a subclass of `WorkspacePath`, handles caching of workspace paths. The `open` and `unlink` methods of `_CachedPath` have been overridden to cache results and remove corresponding cache entries. The `guess_encoding` function is used to determine the encoding of downloaded content. Unit tests have been added to ensure the proper functioning of the caching mechanism, including tests for cache reuse, invalidation, and encoding determination. This feature aims to enhance the performance of file operations, making the overall system more efficient for users. * Changes the security mode for assessment cluster ([#2472](#2472)). In this release, the security mode of the `main` cluster assessment has been updated from LEGACY_SINGLE_USER to LEGACY_SINGLE_USER_STANDARD in the workflows.py file. This change disables passthrough and addresses issue [#1717](#1717). The new data security mode is defined in the compute.ClusterSpec object for the `main` job cluster by modifying the data_security_mode attribute. While no new methods have been introduced, existing functionality related to the cluster's security mode has been modified. Software engineers adopting this project should be aware of the security implications of this change, ensuring the appropriate data protection measures are in place. Manual testing has been conducted to verify the functionality of this update. * Do not normalize cases when reformatting SQL queries in CI check ([#2495](#2495)). In this release, the CI workflow for pushing changes to the repository has been updated to improve the behavior of the SQL query reformatting step. Previously, case normalization of SQL queries was causing issues with case-sensitive columns, resulting in blocked CI checks. This release addresses the issue by adding the `--normalize-case false` flag to the `databricks labs lsql fmt` command, which disables case normalization. This modification allows the CI workflow to pass and ensures correct SQL query formatting, regardless of case sensitivity. The change impacts the assessment/interactive directory, specifically a cluster summary query for interactive assessments. This query involves a change in the ORDER BY clause, replacing a normalized case with the original case. Despite these changes, no new methods have been added, and existing functionality has been modified solely to improve CI efficiency and SQL query compatibility. * Drop source table after successful table move not before ([#2430](#2430)). In this release, we have addressed an issue where the source table was being dropped before a new table was created, which could cause the creation process to fail and leave the source table unavailable. This problem has been resolved by modifying the `_recreate_table` method of the `TableMove` class in the `hive_metastore` package to drop the source table after the new table creation. The updated implementation ensures that the source table remains intact during the creation process, even in case of any issues. This change comes with integration tests and does not involve any modifications to user documentation, CLI commands, workflows, tables, or existing functionality. Additionally, a new test function `test_move_tables_table_properties_mismatch_preserves_original` has been added to `test_table_move.py`, which checks if the original table is preserved when there is a mismatch in table properties during the move operation. The changes also include adding the `pytest` library and the `BadRequest` exception from the `databricks.sdk.errors` package for the new test function. The imports section has been updated accordingly with the removal of `databricks.sdk.errors.NotFound` and the addition of `pytest` and `databricks.sdk.errors.BadRequest`. * Enabled `principal-prefix-access` command to run as collection ([#2450](#2450)). This commit introduces several improvements to the `principal-prefix-access` command in our open-source library. A new flag `run-as-collection` has been added, allowing the command to run as a collection across multiple AWS accounts. A new `get_workspace_context` function has also been implemented, which encapsulates common functionalities and enhances code reusability. Additionally, the `get_workspace_contexts` method has been developed to retrieve a list of `WorkspaceContext` objects, making the command more efficient when handling collections of workspaces. Furthermore, the `install_on_account` method has been updated to use the new `get_workspace_contexts` method. The `principal-prefix-access` command has been enhanced to accept an optional `acc_client` argument, which is used to retrieve information about the assessment run. These changes improve the functionality and organization of the codebase, making it more efficient, flexible, and easier to maintain for users working with multiple AWS accounts and workspaces. * Fixed Driver OOM error by increasing the min memory requirement for node from 16GB to 32 GB ([#2473](#2473)). A modification has been implemented in the `policy.py` file located in the `databricks/labs/ucx/installer` directory, which enhances the minimum memory requirement for the node type from 16GB to 32GB. This adjustment is intended to prevent driver out-of-memory (OOM) errors during assessments. The `_definition` function in the `policy` class has been updated to incorporate the new memory requirement, which will be employed for selecting a suitable node type. The rest of the code remains unchanged. This modification addresses issue [#2398](#2398). While the code has been tested, specific testing details are not provided in the commit message. * Fixed issue when running create-missing-credential cmd tries to create the role again if already created ([#2456](#2456)). In this release, we have implemented a fix to address an issue in the `_identify_missing_paths` function within the `access.py` file of the `databricks/labs/ucx/aws` directory, where the `create-missing-credential` command was attempting to create a role again even if it had already been created. This issue was due to a mismatch in path comparison using the `match` function, which has now been updated to use the `startswith` function instead. This change ensures that the code checks if the path starts with the resource path, thereby resolving issue [#2413](#2413). The `_identify_missing_paths` function identifies missing paths by loading UC compatible roles and iterating through each external location. If a location matches any of the resource paths of the UC compatible roles, the `matching_role` variable is set to True, and the code continues to the next role. If the location does not match any of the resource paths, the `matching_role` variable is set to False. If a match is found, the code continues to the next external location. If no match is found for any of the UC compatible roles, then the location is added to the `missing_paths` set. The diff also includes a conditional check to return an empty list if the `missing_paths` set is empty. Additionally, tests have been added or modified to ensure the proper functioning of the updated code, including unit tests and integration tests. However, there is no mention of manual testing or verification on a staging environment. Overall, this update fixes a specific issue with the `create-missing-credential` command and includes updated tests to ensure proper functionality. * Fixed issue with Interactive Dashboard not showing output ([#2476](#2476)). In this release, we have resolved an issue with the Interactive Dashboard not displaying output by fixing a bug in the query used for the dashboard. Previously, the query was joining on "request_params.clusterid" and selecting "request_params.clusterid" in the SELECT clause, but the correct field name is "request_params.clusterId". The query has been updated to use "request_params.clusterId" instead, both in the JOIN and SELECT clauses. These changes ensure that the Interactive Dashboard displays the correct output, improving the overall functionality and usability of the product. No new methods were added, and existing functionality was changed within the scope of the Interactive Dashboard query. Manual testing is recommended to ensure that the output is now displayed correctly. Additionally, a change has been made to the 'test_installation.py' integration test file to improve the performance of clusters by updating the `min_memory_gb` argument from 16 GB to 32 GB in the `test_job_cluster_policy` function. * Fixed support for table/schema scope for the revert table cli command ([#2428](#2428)). In this release, we have enhanced the `revert table` CLI command to support table and schema scopes in the open-source library. The `revert_migrated_tables` function now accepts optional parameters `schema` and `table` of types str or None, which were previously required parameters. Similarly, the `print_revert_report` function in the `tables_migrator` object within `WorkspaceContext` has been updated to accept the same optional parameters. The `revert_migrated_tables` function now uses these optional parameters when calling the `revert_migrated_tables` method of `tables_migrator` within 'ctx'. Additionally, we have introduced a new dictionary called `reverse_seen` and modified the `_get_tables_to_revert` and `print_revert_report` functions to utilize this dictionary, providing more fine-grained control when reverting table migrations. The `delete_managed` parameter is used to determine if managed tables should be deleted. These changes allow users to specify a specific schema and table to revert, rather than reverting all migrated tables within a workspace. * Refactor view sequencing and return sequenced views if recursion is found ([#2499](#2499)). In this refactored code, the view sequencing for table migration has been improved and now returns sequenced views if recursion is found, addressing issue [#249](#249) * Updated databricks-labs-lsql requirement from <0.9,>=0.5 to >=0.5,<0.10 ([#2489](#2489)). In this release, we have updated the version requirements for the `databricks-labs-lsql` package, changing it from greater than 0.5 and less than 0.9 to greater than 0.5 and less than 0.10. This update enables the use of newer versions of the package while maintaining compatibility with existing systems. The `databricks-labs-lsql` package is used for creating dashboards and managing SQL queries in Databricks. The pull request also includes detailed release notes, a comprehensive changelog, and a list of commits for the updated package. We recommend that all users of this package review the release notes and update to the new version to take advantage of the latest features and improvements. * Updated databricks-sdk requirement from ~=0.29.0 to >=0.29,<0.31 ([#2417](#2417)). In this pull request, the `databricks-sdk` dependency has been updated from version `~=0.29.0` to `>=0.29,<0.31` to allow for the latest version of the package, which includes new features, bug fixes, internal changes, and other updates. This update is in response to the release of version `0.30.0` of the `databricks-sdk` library, which includes new features such as DataPlane support and partner support. In addition to the updated dependency, there have been changes to several files, including `access.py`, `fixtures.py`, `test_access.py`, and `test_workflows.py`. These changes include updates to method calls, import statements, and test data to reflect the new version of the `databricks-sdk` library. The `pyproject.toml` file has also been updated to reflect the new dependency version. This pull request does not include any other changes. * Updated sqlglot requirement from <25.12,>=25.5.0 to >=25.5.0,<25.13 ([#2431](#2431)). In this pull request, we are updating the `sqlglot` dependency from version `>=25.5.0,<25.12` to `>=25.5.0,<25.13`. This update allows us to use the latest version of the `sqlglot` library, which includes several new features and bug fixes. Specifically, the new version includes support for `TryCast` generation and improvements to the `clickhouse` dialect. It is important to note that the previous version had a breaking change related to treating `DATABASE` as `SCHEMA` in `exp.Create`. Therefore, it is crucial to thoroughly test the changes before merging, as breaking changes may affect existing functionality. * Updated sqlglot requirement from <25.13,>=25.5.0 to >=25.5.0,<25.15 ([#2453](#2453)). In this pull request, we have updated the required version range of the `sqlglot` package from `>=25.5.0,<25.13` to `>=25.5.0,<25.15`. This change allows us to install the latest version of the package, which includes several bug fixes and new features. These include improved transpilation of nullable/non-nullable data types and support for TryCast generation in ClickHouse. The changelog for `sqlglot` provides a detailed list of changes in each release, and a list of commits made in the latest release is also included in the pull request. This update will improve the functionality and reliability of our software, as we will now be able to take advantage of the latest features and fixes provided by `sqlglot`. * Updated sqlglot requirement from <25.15,>=25.5.0 to >=25.5.0,<25.17 ([#2480](#2480)). In this release, we have updated the requirement range for the `sqlglot` dependency to '>=25.5.0,<25.17' from '<25.15,>=25.5.0'. This change resolves issues [#2452](#2452) and [#2451](#2451) and includes several bug fixes and new features in the `sqlglot` library version 25.16.1. The updated version includes support for timezone in exp.TimeStrToTime, transpiling from_iso8601_timestamp from presto/trino to duckdb, and mapping %e to %-d in BigQuery. Additionally, there are changes to the parser and optimizer, as well as other bug fixes and refactors. This update does not introduce any major breaking changes and should not affect the functionality of the project. The `sqlglot` library is used for parsing, analyzing, and rewriting SQL queries, and the new version range provides improved functionality and reliability. * Updated sqlglot requirement from <25.17,>=25.5.0 to >=25.5.0,<25.18 ([#2488](#2488)). this pull request updates the sqlglot library requirement to version 25.5.0 or greater, but less than 25.18. By doing so, it enables the use of the latest version of sqlglot, while still maintaining compatibility with the current implementation. The changelog and commits for each release from v25.17.0 to v25.16.1 are provided for reference, detailing bug fixes, new features, and breaking changes. As a software engineer, it's important to review this pull request and ensure it aligns with the project's requirements before merging, to take advantage of the latest improvements and fixes in sqlglot. * Updated sqlglot requirement from <25.18,>=25.5.0 to >=25.5.0,<25.19 ([#2509](#2509)). In this release, we have updated the required version of the `sqlglot` package in our project's dependencies. Previously, we required a version greater than or equal to 25.5.0 and less than 25.18, which has now been updated to require a version greater than or equal to 25.5.0 and less than 25.19. This change was made automatically by Dependabot, a service that helps to keep dependencies up to date, in order to permit the latest version of the `sqlglot` package. The pull request contains a detailed list of the changes made in the `sqlglot` package between versions 25.5.0 and 25.18.0, as well as a list of the commits that were made during this time. These details can be helpful for understanding the potential impact of the update on the project. * [chore] make `GRANT` migration logic isolated to `MigrateGrants` component ([#2492](#2492)). In this release, the grant migration logic has been isolated to a separate `MigrateGrants` component, enhancing code modularity and maintainability. This new component, along with the `ACLMigrator`, is now responsible for handling grants and Access Control Lists (ACLs) migration. The `MigrateGrants` class takes grant loaders as input, applies grants to a Unity Catalog (UC) table based on a given source table, and is utilized in the `acl_migrator` method. The `ACLMigrator` class manages ACL migration for the migrated tables, taking instances of necessary classes as arguments and setting ACLs for the migrated tables based on the migration status. These changes bring better separation of concerns, making the code easier to understand, test, and maintain. Dependency updates: * Updated databricks-sdk requirement from ~=0.29.0 to >=0.29,<0.31 ([#2417](#2417)). * Updated sqlglot requirement from <25.12,>=25.5.0 to >=25.5.0,<25.13 ([#2431](#2431)). * Updated sqlglot requirement from <25.13,>=25.5.0 to >=25.5.0,<25.15 ([#2453](#2453)). * Updated sqlglot requirement from <25.15,>=25.5.0 to >=25.5.0,<25.17 ([#2480](#2480)). * Updated databricks-labs-lsql requirement from <0.9,>=0.5 to >=0.5,<0.10 ([#2489](#2489)). * Updated sqlglot requirement from <25.17,>=25.5.0 to >=25.5.0,<25.18 ([#2488](#2488)). * Updated sqlglot requirement from <25.18,>=25.5.0 to >=25.5.0,<25.19 ([#2509](#2509)).

1 parent cedad88 commit 622ed83Copy full SHA for 622ed83

2 files changed

+37

-1

lines changed

CHANGELOG.md
src/databricks/labs/ucx
- __about__.py

2 files changed

+37

-1

lines changed

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit 622ed83

2 files changed

2 files changed

File tree

2 files changed

2 files changed

0 commit comments