-
Notifications
You must be signed in to change notification settings - Fork 99
Commit 622ed83
authored
Release v0.34.0 (#2515)
* Added a check for No isolation shared clusters and MLR
([#2484](#2484)). This
commit introduces a check for `No isolation shared clusters` utilizing
MLR as part of the assessment workflow and cluster crawler, addressing
issue [#846](#846). A new
function, `is_mlr`, has been implemented to determine if the Spark
version corresponds to an MLR cluster. If the cluster has no isolation
and uses MLR, the assessment failure list is appended with an
appropriate error message. Thorough testing, including unit tests and
manual verification, has been conducted. However, user documentation and
new CLI commands, workflows, tables, or unit/integration tests have not
been added. Additionally, a new test has been added to verify the
behavior of MLR clusters without isolation, enhancing the assessment
workflow's accuracy in identifying unsupported configurations.
* Added a section in migration dashboard to list the failed tables, etc
([#2406](#2406)). In this
release, we have introduced a new logging message format for failed
table migrations in the `TableMigrate` class, specifically impacting the
`_migrate_external_table`, `_migrate_external_table_hiveserde_in_place`,
`_migrate_dbfs_root_table`, `_migrate_table_create_ctas`,
`_migrate_table_in_mount`, and `_migrate_acl` methods within the
`table_migrate.py` file. This update employs the `failed-to-migrate`
prefix in log messages for improved failure reason identification during
table migrations, enhancing debugging capabilities. As part of this
release, we have also developed a new SQL file,
`05_1_failed_table_migration.sql`, which retrieves a list of failed
table migrations by extracting messages with the 'failed-to-migrate:'
prefix from the inventory.logs table and returning the corresponding
message text. While this release does not include new methods or user
documentation, it resolves issue
[#1754](#1754) and has been
manually tested with positive results in the staging environment,
demonstrating its functionality.
* Added clean up activities when `migrate-credentials` cmd fails
intermittently
([#2479](#2479)). This pull
request enhances the robustness of the `migrate-credentials` command for
Azure in the event of intermittent failures during the creation of
access connectors and storage credentials. It introduces new methods,
`delete_storage_credential` and `delete_access_connectors`, which are
responsible for removing incomplete resources when errors occur. The
`_migrate_service_principals` and
`_create_storage_credentials_for_storage_accounts` methods now handle
`PermissionDenied`, `NotFound`, and `BadRequest` exceptions, deleting
created storage credentials and access connectors if exceptions occur.
Additionally, error messages have been updated to guide users in
resolving issues before attempting the operation again. The PR also
modifies the `sp_migration` fixture in the
`tests/unit/azure/test_credentials.py` file, simplifying the deletion
process for access connectors and improving the testing of the
`ServicePrincipalMigration` class. These changes address issue
[#2362](#2362), ensuring
clean-up activities in case of intermittent failures and improving the
overall reliability of the system.
* Added standalone migrate ACLs
([#2284](#2284)). A new
`migrate-acls` command has been introduced to facilitate the migration
of Access Control Lists (ACLs) from a legacy metastore to a Unity
Catalog (UC) metastore. The command, designed to work with HMS
federation and other table migration scenarios, can be executed with
optional flags `target-catalog` and `hms-fed` to specify the target
catalog and migrate HMS-FED ACLs, respectively. The release also
includes modifications to the `labs.yml` file, adding the new command
and its details to the `commands` section. In addition, a new
`ACLMigrator` class has been added to the
`databricks.labs.ucx.contexts.application` module to handle ACL
migration for tables in a standalone manner. A new test file,
`test_migrate_acls.py`, contains unit tests for ACL migration in a Hive
metastore, covering various scenarios and ensuring proper query
generation. These features streamline and improve the functionality of
ACL migration, offering better access control management for users.
* Appends metastore_id or location_name to roles for uniqueness
([#2471](#2471)). A new
method, `_generate_role_name`, has been added to the `Access` class in
the `aws/access.py` file of the `databricks/labs/ucx` module to generate
unique names for AWS roles using a consistent naming convention. The
`list_uc_roles` method has been updated to utilize this new method for
creating role names. In response to issue
[#2336](#2336), the
`create_missing_principals` change enforces role uniqueness on AWS by
modifying the `ExternalLocation` table to include `metastore_id` or
`location_name` for uniqueness. To ensure proper cleanup, the
`create_uber_principal` method has been updated to delete the instance
profile if creating the cluster policy fails due to a `PermissionError`.
Unit tests have been added to verify these changes, including tests for
the new role name generation method and the updated `ExternalLocation`
table. The `MetastoreAssignment` class is also imported in this diff,
although its usage is not immediately clear. These changes aim to
improve the creation of unique AWS roles for Databricks Labs UCX and
enforce role uniqueness on AWS.
* Cache workspace content
([#2497](#2497)). In this
release, we have implemented a caching mechanism for workspace content
to improve load times and bypass rate limits. The `WorkspaceCache` class
handles caching of workspace content, with the `_CachedIO` and
`_PathLruCache` classes managing IO operation caching and LRU caching,
respectively. The `_CachedPath` class, a subclass of `WorkspacePath`,
handles caching of workspace paths. The `open` and `unlink` methods of
`_CachedPath` have been overridden to cache results and remove
corresponding cache entries. The `guess_encoding` function is used to
determine the encoding of downloaded content. Unit tests have been added
to ensure the proper functioning of the caching mechanism, including
tests for cache reuse, invalidation, and encoding determination. This
feature aims to enhance the performance of file operations, making the
overall system more efficient for users.
* Changes the security mode for assessment cluster
([#2472](#2472)). In this
release, the security mode of the `main` cluster assessment has been
updated from LEGACY_SINGLE_USER to LEGACY_SINGLE_USER_STANDARD in the
workflows.py file. This change disables passthrough and addresses issue
[#1717](#1717). The new data
security mode is defined in the compute.ClusterSpec object for the
`main` job cluster by modifying the data_security_mode attribute. While
no new methods have been introduced, existing functionality related to
the cluster's security mode has been modified. Software engineers
adopting this project should be aware of the security implications of
this change, ensuring the appropriate data protection measures are in
place. Manual testing has been conducted to verify the functionality of
this update.
* Do not normalize cases when reformatting SQL queries in CI check
([#2495](#2495)). In this
release, the CI workflow for pushing changes to the repository has been
updated to improve the behavior of the SQL query reformatting step.
Previously, case normalization of SQL queries was causing issues with
case-sensitive columns, resulting in blocked CI checks. This release
addresses the issue by adding the `--normalize-case false` flag to the
`databricks labs lsql fmt` command, which disables case normalization.
This modification allows the CI workflow to pass and ensures correct SQL
query formatting, regardless of case sensitivity. The change impacts the
assessment/interactive directory, specifically a cluster summary query
for interactive assessments. This query involves a change in the ORDER
BY clause, replacing a normalized case with the original case. Despite
these changes, no new methods have been added, and existing
functionality has been modified solely to improve CI efficiency and SQL
query compatibility.
* Drop source table after successful table move not before
([#2430](#2430)). In this
release, we have addressed an issue where the source table was being
dropped before a new table was created, which could cause the creation
process to fail and leave the source table unavailable. This problem has
been resolved by modifying the `_recreate_table` method of the
`TableMove` class in the `hive_metastore` package to drop the source
table after the new table creation. The updated implementation ensures
that the source table remains intact during the creation process, even
in case of any issues. This change comes with integration tests and does
not involve any modifications to user documentation, CLI commands,
workflows, tables, or existing functionality. Additionally, a new test
function `test_move_tables_table_properties_mismatch_preserves_original`
has been added to `test_table_move.py`, which checks if the original
table is preserved when there is a mismatch in table properties during
the move operation. The changes also include adding the `pytest` library
and the `BadRequest` exception from the `databricks.sdk.errors` package
for the new test function. The imports section has been updated
accordingly with the removal of `databricks.sdk.errors.NotFound` and the
addition of `pytest` and `databricks.sdk.errors.BadRequest`.
* Enabled `principal-prefix-access` command to run as collection
([#2450](#2450)). This
commit introduces several improvements to the `principal-prefix-access`
command in our open-source library. A new flag `run-as-collection` has
been added, allowing the command to run as a collection across multiple
AWS accounts. A new `get_workspace_context` function has also been
implemented, which encapsulates common functionalities and enhances code
reusability. Additionally, the `get_workspace_contexts` method has been
developed to retrieve a list of `WorkspaceContext` objects, making the
command more efficient when handling collections of workspaces.
Furthermore, the `install_on_account` method has been updated to use the
new `get_workspace_contexts` method. The `principal-prefix-access`
command has been enhanced to accept an optional `acc_client` argument,
which is used to retrieve information about the assessment run. These
changes improve the functionality and organization of the codebase,
making it more efficient, flexible, and easier to maintain for users
working with multiple AWS accounts and workspaces.
* Fixed Driver OOM error by increasing the min memory requirement for
node from 16GB to 32 GB
([#2473](#2473)). A
modification has been implemented in the `policy.py` file located in the
`databricks/labs/ucx/installer` directory, which enhances the minimum
memory requirement for the node type from 16GB to 32GB. This adjustment
is intended to prevent driver out-of-memory (OOM) errors during
assessments. The `_definition` function in the `policy` class has been
updated to incorporate the new memory requirement, which will be
employed for selecting a suitable node type. The rest of the code
remains unchanged. This modification addresses issue
[#2398](#2398). While the
code has been tested, specific testing details are not provided in the
commit message.
* Fixed issue when running create-missing-credential cmd tries to create
the role again if already created
([#2456](#2456)). In this
release, we have implemented a fix to address an issue in the
`_identify_missing_paths` function within the `access.py` file of the
`databricks/labs/ucx/aws` directory, where the
`create-missing-credential` command was attempting to create a role
again even if it had already been created. This issue was due to a
mismatch in path comparison using the `match` function, which has now
been updated to use the `startswith` function instead. This change
ensures that the code checks if the path starts with the resource path,
thereby resolving issue
[#2413](#2413). The
`_identify_missing_paths` function identifies missing paths by loading
UC compatible roles and iterating through each external location. If a
location matches any of the resource paths of the UC compatible roles,
the `matching_role` variable is set to True, and the code continues to
the next role. If the location does not match any of the resource paths,
the `matching_role` variable is set to False. If a match is found, the
code continues to the next external location. If no match is found for
any of the UC compatible roles, then the location is added to the
`missing_paths` set. The diff also includes a conditional check to
return an empty list if the `missing_paths` set is empty. Additionally,
tests have been added or modified to ensure the proper functioning of
the updated code, including unit tests and integration tests. However,
there is no mention of manual testing or verification on a staging
environment. Overall, this update fixes a specific issue with the
`create-missing-credential` command and includes updated tests to ensure
proper functionality.
* Fixed issue with Interactive Dashboard not showing output
([#2476](#2476)). In this
release, we have resolved an issue with the Interactive Dashboard not
displaying output by fixing a bug in the query used for the dashboard.
Previously, the query was joining on "request_params.clusterid" and
selecting "request_params.clusterid" in the SELECT clause, but the
correct field name is "request_params.clusterId". The query has been
updated to use "request_params.clusterId" instead, both in the JOIN and
SELECT clauses. These changes ensure that the Interactive Dashboard
displays the correct output, improving the overall functionality and
usability of the product. No new methods were added, and existing
functionality was changed within the scope of the Interactive Dashboard
query. Manual testing is recommended to ensure that the output is now
displayed correctly. Additionally, a change has been made to the
'test_installation.py' integration test file to improve the performance
of clusters by updating the `min_memory_gb` argument from 16 GB to 32 GB
in the `test_job_cluster_policy` function.
* Fixed support for table/schema scope for the revert table cli command
([#2428](#2428)). In this
release, we have enhanced the `revert table` CLI command to support
table and schema scopes in the open-source library. The
`revert_migrated_tables` function now accepts optional parameters
`schema` and `table` of types str or None, which were previously
required parameters. Similarly, the `print_revert_report` function in
the `tables_migrator` object within `WorkspaceContext` has been updated
to accept the same optional parameters. The `revert_migrated_tables`
function now uses these optional parameters when calling the
`revert_migrated_tables` method of `tables_migrator` within 'ctx'.
Additionally, we have introduced a new dictionary called `reverse_seen`
and modified the `_get_tables_to_revert` and `print_revert_report`
functions to utilize this dictionary, providing more fine-grained
control when reverting table migrations. The `delete_managed` parameter
is used to determine if managed tables should be deleted. These changes
allow users to specify a specific schema and table to revert, rather
than reverting all migrated tables within a workspace.
* Refactor view sequencing and return sequenced views if recursion is
found ([#2499](#2499)). In
this refactored code, the view sequencing for table migration has been
improved and now returns sequenced views if recursion is found,
addressing issue
[#249](#249)
* Updated databricks-labs-lsql requirement from <0.9,>=0.5 to
>=0.5,<0.10
([#2489](#2489)). In this
release, we have updated the version requirements for the
`databricks-labs-lsql` package, changing it from greater than 0.5 and
less than 0.9 to greater than 0.5 and less than 0.10. This update
enables the use of newer versions of the package while maintaining
compatibility with existing systems. The `databricks-labs-lsql` package
is used for creating dashboards and managing SQL queries in Databricks.
The pull request also includes detailed release notes, a comprehensive
changelog, and a list of commits for the updated package. We recommend
that all users of this package review the release notes and update to
the new version to take advantage of the latest features and
improvements.
* Updated databricks-sdk requirement from ~=0.29.0 to >=0.29,<0.31
([#2417](#2417)). In this
pull request, the `databricks-sdk` dependency has been updated from
version `~=0.29.0` to `>=0.29,<0.31` to allow for the latest version of
the package, which includes new features, bug fixes, internal changes,
and other updates. This update is in response to the release of version
`0.30.0` of the `databricks-sdk` library, which includes new features
such as DataPlane support and partner support. In addition to the
updated dependency, there have been changes to several files, including
`access.py`, `fixtures.py`, `test_access.py`, and `test_workflows.py`.
These changes include updates to method calls, import statements, and
test data to reflect the new version of the `databricks-sdk` library.
The `pyproject.toml` file has also been updated to reflect the new
dependency version. This pull request does not include any other
changes.
* Updated sqlglot requirement from <25.12,>=25.5.0 to >=25.5.0,<25.13
([#2431](#2431)). In this
pull request, we are updating the `sqlglot` dependency from version
`>=25.5.0,<25.12` to `>=25.5.0,<25.13`. This update allows us to use the
latest version of the `sqlglot` library, which includes several new
features and bug fixes. Specifically, the new version includes support
for `TryCast` generation and improvements to the `clickhouse` dialect.
It is important to note that the previous version had a breaking change
related to treating `DATABASE` as `SCHEMA` in `exp.Create`. Therefore,
it is crucial to thoroughly test the changes before merging, as breaking
changes may affect existing functionality.
* Updated sqlglot requirement from <25.13,>=25.5.0 to >=25.5.0,<25.15
([#2453](#2453)). In this
pull request, we have updated the required version range of the
`sqlglot` package from `>=25.5.0,<25.13` to `>=25.5.0,<25.15`. This
change allows us to install the latest version of the package, which
includes several bug fixes and new features. These include improved
transpilation of nullable/non-nullable data types and support for
TryCast generation in ClickHouse. The changelog for `sqlglot` provides a
detailed list of changes in each release, and a list of commits made in
the latest release is also included in the pull request. This update
will improve the functionality and reliability of our software, as we
will now be able to take advantage of the latest features and fixes
provided by `sqlglot`.
* Updated sqlglot requirement from <25.15,>=25.5.0 to >=25.5.0,<25.17
([#2480](#2480)). In this
release, we have updated the requirement range for the `sqlglot`
dependency to '>=25.5.0,<25.17' from '<25.15,>=25.5.0'. This change
resolves issues
[#2452](#2452) and
[#2451](#2451) and includes
several bug fixes and new features in the `sqlglot` library version
25.16.1. The updated version includes support for timezone in
exp.TimeStrToTime, transpiling from_iso8601_timestamp from presto/trino
to duckdb, and mapping %e to %-d in BigQuery. Additionally, there are
changes to the parser and optimizer, as well as other bug fixes and
refactors. This update does not introduce any major breaking changes and
should not affect the functionality of the project. The `sqlglot`
library is used for parsing, analyzing, and rewriting SQL queries, and
the new version range provides improved functionality and reliability.
* Updated sqlglot requirement from <25.17,>=25.5.0 to >=25.5.0,<25.18
([#2488](#2488)). this pull
request updates the sqlglot library requirement to version 25.5.0 or
greater, but less than 25.18. By doing so, it enables the use of the
latest version of sqlglot, while still maintaining compatibility with
the current implementation. The changelog and commits for each release
from v25.17.0 to v25.16.1 are provided for reference, detailing bug
fixes, new features, and breaking changes. As a software engineer, it's
important to review this pull request and ensure it aligns with the
project's requirements before merging, to take advantage of the latest
improvements and fixes in sqlglot.
* Updated sqlglot requirement from <25.18,>=25.5.0 to >=25.5.0,<25.19
([#2509](#2509)). In this
release, we have updated the required version of the `sqlglot` package
in our project's dependencies. Previously, we required a version greater
than or equal to 25.5.0 and less than 25.18, which has now been updated
to require a version greater than or equal to 25.5.0 and less than
25.19. This change was made automatically by Dependabot, a service that
helps to keep dependencies up to date, in order to permit the latest
version of the `sqlglot` package. The pull request contains a detailed
list of the changes made in the `sqlglot` package between versions
25.5.0 and 25.18.0, as well as a list of the commits that were made
during this time. These details can be helpful for understanding the
potential impact of the update on the project.
* [chore] make `GRANT` migration logic isolated to `MigrateGrants`
component ([#2492](#2492)).
In this release, the grant migration logic has been isolated to a
separate `MigrateGrants` component, enhancing code modularity and
maintainability. This new component, along with the `ACLMigrator`, is
now responsible for handling grants and Access Control Lists (ACLs)
migration. The `MigrateGrants` class takes grant loaders as input,
applies grants to a Unity Catalog (UC) table based on a given source
table, and is utilized in the `acl_migrator` method. The `ACLMigrator`
class manages ACL migration for the migrated tables, taking instances of
necessary classes as arguments and setting ACLs for the migrated tables
based on the migration status. These changes bring better separation of
concerns, making the code easier to understand, test, and maintain.
Dependency updates:
* Updated databricks-sdk requirement from ~=0.29.0 to >=0.29,<0.31
([#2417](#2417)).
* Updated sqlglot requirement from <25.12,>=25.5.0 to >=25.5.0,<25.13
([#2431](#2431)).
* Updated sqlglot requirement from <25.13,>=25.5.0 to >=25.5.0,<25.15
([#2453](#2453)).
* Updated sqlglot requirement from <25.15,>=25.5.0 to >=25.5.0,<25.17
([#2480](#2480)).
* Updated databricks-labs-lsql requirement from <0.9,>=0.5 to
>=0.5,<0.10 ([#2489](#2489)).
* Updated sqlglot requirement from <25.17,>=25.5.0 to >=25.5.0,<25.18
([#2488](#2488)).
* Updated sqlglot requirement from <25.18,>=25.5.0 to >=25.5.0,<25.19
([#2509](#2509)).1 parent cedad88 commit 622ed83Copy full SHA for 622ed83
File tree
Expand file treeCollapse file tree
2 files changed
+37
-1
lines changedOpen diff view settings
Filter options
- src/databricks/labs/ucx
Expand file treeCollapse file tree
2 files changed
+37
-1
lines changedOpen diff view settings
0 commit comments