v0.41.0
- Added UCX history schema and table for storing UCX's artifact (#2744). In this release, we have introduced a new dataclass
Historicalto store UCX artifacts for migration progress tracking, including attributes such as workspace identifier, job run identifier, object type, object identifier, data, failures, owner, and UCX version. TheProgressTrackingInstallationclass has been updated to include a new method for deploying a table for historical records using theHistoricaldataclass. Additionally, we have modified thedatabricks labs ucx create-ucx-catalogcommand, and updated the integration test filetest_install.pyto include a parametrized test function for checking if theworkflow_runsandhistoricaltables are created by the UCX installation. We have also renamed the functiontest_progress_tracking_installation_run_creates_workflow_runs_tabletotest_progress_tracking_installation_run_creates_tablesto reflect the addition of the new table. These changes add necessary functionality for tracking UCX migration progress and provide associated tests to ensure correctness, thereby improving UCX's progress tracking functionality and resolving issue #2572. - Added
hjsonto known list (#2899). In this release, we are excited to announce the addition of support for the Hjson library, addressing partial resolution for issue #1931 related to configuration. This change integrates the following Hjson modules: hjson, hjson.compat, hjson.decoder, hjson.encoder, hjson.encoderH, hjson.ordered_dict, hjson.scanner, and hjson.tool. Hjson is a powerful library that enhances JSON functionality by providing comments and multi-line strings. By incorporating Hjson into our library's known list, users can now leverage its advanced features in a more streamlined and cohesive manner, resulting in a more versatile and efficient development experience. - Bump databrickslabs/sandbox from acceptance/v0.3.0 to 0.3.1 (#2894). In this version bump from acceptance/v0.3.0 to 0.3.1 of the databrickslabs/sandbox library, several enhancements and bug fixes have been implemented. These changes include updates to the README file with instructions on how to use the library with the databricks labs sandbox command, fixes for the
unsupported protocol schemeerror, and the addition of more git-related libraries. Additionally, dependency updates for golang.org/x/crypto from version 0.16.0 to 0.17.0 have been made in the /go-libs and /runtime-packages directories. This version also introduces new commits that allow larger logs from acceptance tests and implement experimental OIDC refresh token rotation. The tests using this library have been updated to utilize the new version to ensure compatibility and functionality. - Fixed
AttributeError:UsedTablehas no attribute 'table'by adding more type checks (#2895). In this release, we have made significant improvements to the library's type safety and robustness in handlingUsedTableobjects. We fixed an AttributeError related to theUsedTableclass not having atableattribute by adding more type checks in thecollect_tablesmethod of theTablePyCollectorandCollectTablesVisitclasses. We also introducedAstroidSyntaxErrorexception handling and logging. Additionally, we renamed thetable_infosvariable toused_tablesand changed its type to 'list[JobProblem]' in thecollect_tables_from_treeand '_SparkSqlAnalyzer.collect_tables' functions. We added conditional statements to check for the presence of required attributes before yielding a new 'TableInfoNode'. A new unit test file, 'test_context.py', has been added to exercise thetables_collectormethod, which extracts table references from a given code snippet, improving the linter's table reference extraction capabilities. - Fixed
TokenErrorin assessment workflow (#2896). In this update, we've implemented a bug fix to improve the robustness of the assessment workflow in our open-source library. Previously, the code only caught parse errors during the execution of the workflow, but parse errors were not the only cause of failures. This commit changes the exception being caught fromParseErrorto the more generalSqlglotError, which is the common ancestor of bothParseErrorandTokenError. By catching the more generalSqlglotError, the code is now able to handle both parse errors and tokenization errors, providing a more robust solution. Thewalk_expressionsmethod has been updated to catchSqlglotErrorinstead ofParseError. This change allows the assessment workflow to handle a wider range of issues that may arise during the execution of SQL code, making it more versatile and reliable. TheSqlglotErrorclass has been imported from thesqlglot.errorsmodule. This update enhances the assessment workflow's ability to handle more complex SQL queries, ensuring smoother execution. - Fixed
assessmentworkflow failure for jobs running tasks on existing interactive clusters (#2889). In this release, we have implemented changes to address a failure in theassessmentworkflow when jobs are run on existing interactive clusters (issue #2886). The fix includes modifying thejobs.pyfile by adding a try-except block when loading libraries for an existing cluster, utilizing a new exception typeResourceDoesNotExistto handle cases where the cluster does not exist. Furthermore, the_register_cluster_infofunction has been enhanced to manage situations where the existing cluster is not found, raising aDependencyProblemwith the message 'cluster-not-found'. This ensures the workflow can continue running jobs on other clusters or with other configurations. Overall, these enhancements improve the system's robustness by gracefully handling edge cases and preventing workflow failure due to non-existent clusters. - Ignore UCX inventory database in HMS while scanning tables (#2897). In this release, changes have been implemented in the 'tables.py' file of the 'databricks/labs/ucx/hive_metastore' directory to address the issue of mistakenly scanning the UCX inventory database during table scanning. The
_all_databasesmethod has been updated to exclude the UCX inventory database by checking if the database name matches the schema name and skipping it if so. This change affects the_crawland_get_table_namesmethods, which no longer process the UCX inventory schema when scanning for tables. A TODO comment has been added to the_get_table_namesmethod, suggesting potential removal of the UCX inventory schema check in future releases. This change ensures accurate and efficient table scanning, avoiding thehallucinationof mistaking the UCX inventory schema as a database to be scanned. - Tech debt: fix situations where
next()isn't being used properly (#2885). In this commit, technical debt related to the proper usage of Python's built-innext()function has been addressed in several areas of the codebase. Previously, there was an assumption thatNonewould be returned if there is no next value, which is incorrect. This commit updates and fixes the implementation to correctly handle cases wherenext()is used. Specifically, theget_dbutils_notebook_run_path_arg,of_languageclass method in theCellLanguageclass, and certain methods in thetest_table_migrate.pyfile have been updated to correctly handle situations where there is no next value. Thehas_path()method has been removed, and theprepend_path()method has been updated to insert the given path at the beginning of the list of system paths. Additionally, a test case for checking table in mount mapping with table owner has been included. These changes improve the robustness and reliability of the code by ensuring that it handles edge cases related to thenext()function and paths correctly. - [chore] apply
make fmt(#2883). In this release, themake_randomparameter has been removed from thesave_locationsmethod in theconftest.pyfile for the integration tests. This method is used to save a list ofExternalLocationobjects to theexternal_locationstable in the inventory database, and it no longer requires themake_randomparameter. In the updated implementation, thesave_locationsmethod creates a singleExternalLocationobject with a specific string and priority based on the workspace environment (Azure or AWS), and then uses the SQL backend to save the list ofExternalLocationobjects to the database. This change simplifies thesave_locationsmethod and makes it more reusable throughout the test suite.
Dependency updates:
- Bump databrickslabs/sandbox from acceptance/v0.3.0 to 0.3.1 (#2894).
Contributors: @nfx, @asnare, @dependabot[bot], @JCZuurmond, @pritishpai