Skip to content

Commit c3173eb

Browse files
authored
Release v0.1.1 (#261)
* Added batched iteration for `INSERT INTO` queries in `StatementExecutionBackend` with default `max_records_per_batch=1000` ([#237](#237)). * Added crawler for mount points ([#209](#209)). * Added crawlers for compatibility of jobs and clusters, along with basic recommendations for external locations ([#244](#244)). * Added safe return on grants ([#246](#246)). * Added ability to specify empty group filter in the installer script ([#216](#216)) ([#217](#217)). * Added ability to install application by multiple different users on the same workspace ([#235](#235)). * Added dashboard creation on installation and a requirement for `warehouse_id` in config, so that the assessment dashboards are refreshed automatically after job runs ([#214](#214)). * Added reliance on rate limiting from Databricks SDK for listing workspace ([#258](#258)). * Fixed errors in corner cases where Azure Service Principal Credentials were not available in Spark context ([#254](#254)). * Fixed `DESCRIBE TABLE` throwing errors when listing Legacy Table ACLs ([#238](#238)). * Fixed `file already exists` error in the installer script ([#219](#219)) ([#222](#222)). * Fixed `guess_external_locations` failure with `AttributeError: as_dict` and added an integration test ([#259](#259)). * Fixed error handling edge cases in `crawl_tables` task ([#243](#243)) ([#251](#251)). * Fixed `crawl_permissions` task failure on folder names containing a forward slash ([#234](#234)). * Improved `README` notebook documentation ([#260](#260), [#228](#228), [#252](#252), [#223](#223), [#225](#225)). * Removed redundant `.python-version` file ([#221](#221)). * Removed discovery of account groups from `crawl_permissions` task ([#240](#240)). * Updated databricks-sdk requirement from ~=0.8.0 to ~=0.9.0 ([#245](#245)).
1 parent 1a07212 commit c3173eb

File tree

14 files changed

+52
-57
lines changed

14 files changed

+52
-57
lines changed

CHANGELOG.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,26 @@
11
# Version changelog
22

3+
## 0.1.1
4+
5+
* Added batched iteration for `INSERT INTO` queries in `StatementExecutionBackend` with default `max_records_per_batch=1000` ([#237](https://github.com/databricks/ucx/pull/237)).
6+
* Added crawler for mount points ([#209](https://github.com/databricks/ucx/pull/209)).
7+
* Added crawlers for compatibility of jobs and clusters, along with basic recommendations for external locations ([#244](https://github.com/databricks/ucx/pull/244)).
8+
* Added safe return on grants ([#246](https://github.com/databricks/ucx/pull/246)).
9+
* Added ability to specify empty group filter in the installer script ([#216](https://github.com/databricks/ucx/pull/216)) ([#217](https://github.com/databricks/ucx/pull/217)).
10+
* Added ability to install application by multiple different users on the same workspace ([#235](https://github.com/databricks/ucx/pull/235)).
11+
* Added dashboard creation on installation and a requirement for `warehouse_id` in config, so that the assessment dashboards are refreshed automatically after job runs ([#214](https://github.com/databricks/ucx/pull/214)).
12+
* Added reliance on rate limiting from Databricks SDK for listing workspace ([#258](https://github.com/databricks/ucx/pull/258)).
13+
* Fixed errors in corner cases where Azure Service Principal Credentials were not available in Spark context ([#254](https://github.com/databricks/ucx/pull/254)).
14+
* Fixed `DESCRIBE TABLE` throwing errors when listing Legacy Table ACLs ([#238](https://github.com/databricks/ucx/pull/238)).
15+
* Fixed `file already exists` error in the installer script ([#219](https://github.com/databricks/ucx/pull/219)) ([#222](https://github.com/databricks/ucx/pull/222)).
16+
* Fixed `guess_external_locations` failure with `AttributeError: as_dict` and added an integration test ([#259](https://github.com/databricks/ucx/pull/259)).
17+
* Fixed error handling edge cases in `crawl_tables` task ([#243](https://github.com/databricks/ucx/pull/243)) ([#251](https://github.com/databricks/ucx/pull/251)).
18+
* Fixed `crawl_permissions` task failure on folder names containing a forward slash ([#234](https://github.com/databricks/ucx/pull/234)).
19+
* Improved `README` notebook documentation ([#260](https://github.com/databricks/ucx/pull/260), [#228](https://github.com/databricks/ucx/pull/228), [#252](https://github.com/databricks/ucx/pull/252), [#223](https://github.com/databricks/ucx/pull/223), [#225](https://github.com/databricks/ucx/pull/225)).
20+
* Removed redundant `.python-version` file ([#221](https://github.com/databricks/ucx/pull/221)).
21+
* Removed discovery of account groups from `crawl_permissions` task ([#240](https://github.com/databricks/ucx/pull/240)).
22+
* Updated databricks-sdk requirement from ~=0.8.0 to ~=0.9.0 ([#245](https://github.com/databricks/ucx/pull/245)).
23+
324
## 0.1.0
425

526
Features
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
__version__ = "0.1.0"
1+
__version__ = "0.1.1"

src/databricks/labs/ucx/assessment/commands/create_table_inventory.scala

Lines changed: 0 additions & 32 deletions
This file was deleted.

src/databricks/labs/ucx/assessment/crawlers.py

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -148,7 +148,3 @@ def snapshot(self) -> list[ClusterInfo]:
148148
def _try_fetch(self) -> list[ClusterInfo]:
149149
for row in self._fetch(f"SELECT * FROM {self._schema}.{self._table}"):
150150
yield JobInfo(*row)
151-
152-
153-
if __name__ == "__main__":
154-
print("Databricks UC Assessment")

src/databricks/labs/ucx/framework/crawlers.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,7 @@ def _row_to_sql(row, fields):
8686
elif f.type == bool:
8787
data.append("TRUE" if value else "FALSE")
8888
elif f.type == str:
89-
value = value.replace("'", "''")
89+
value = str(value).replace("'", "''")
9090
data.append(f"'{value}'")
9191
elif f.type == int:
9292
data.append(f"{value}")

src/databricks/labs/ucx/hive_metastore/data_objects.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ def _external_locations(self, tables: list[Row], mounts) -> list[ExternalLocatio
2525
min_slash = 2
2626
external_locations: list[ExternalLocation] = []
2727
for table in tables:
28-
location = table.as_dict()["location"]
28+
location = table.location
2929
if location is not None and len(location) > 0:
3030
if location.startswith("dbfs:/mnt"):
3131
for mount in mounts:

src/databricks/labs/ucx/hive_metastore/grants.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -234,6 +234,6 @@ def _grants(
234234
any_file=any_file,
235235
anonymous_function=anonymous_function,
236236
)
237-
except RuntimeError as e:
237+
except Exception as e:
238238
logger.error(f"Couldn't fetch grants for object {on_type} {key}: {e}")
239239
return []

src/databricks/labs/ucx/hive_metastore/tables.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -140,6 +140,6 @@ def _describe(self, catalog: str, database: str, table: str) -> Table | None:
140140
location=describe.get("Location", None),
141141
view_text=describe.get("View Text", None),
142142
)
143-
except RuntimeError as e:
143+
except Exception as e:
144144
logger.error(f"Couldn't fetch information for table {full_name} : {e}")
145145
return None

src/databricks/labs/ucx/mixins/sql.py

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -36,17 +36,29 @@ def __repr__(self):
3636

3737

3838
class Row(tuple):
39+
# Python SDK convention
3940
def as_dict(self) -> dict[str, any]:
4041
return dict(zip(self.__columns__, self, strict=True))
4142

42-
def __getattr__(self, col):
43-
idx = self.__columns__.index(col)
44-
return self[idx]
43+
# PySpark convention
44+
def __contains__(self, item):
45+
return item in self.__columns__
4546

4647
def __getitem__(self, col):
48+
if isinstance(col, int | slice):
49+
return super().__getitem__(col)
4750
# if columns are named `2 + 2`, for example
4851
return self.__getattr__(col)
4952

53+
def __getattr__(self, col):
54+
try:
55+
idx = self.__columns__.index(col)
56+
return self[idx]
57+
except IndexError:
58+
raise AttributeError(col) # noqa: B904
59+
except ValueError:
60+
raise AttributeError(col) # noqa: B904
61+
5062
def __repr__(self):
5163
return f"Row({', '.join(f'{k}={v}' for (k, v) in zip(self.__columns__, self, strict=True))})"
5264

src/databricks/labs/ucx/runtime.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,12 @@ def crawl_tables(_: MigrationConfig):
3434
readily accessible point of reference for users, data engineers, and administrators."""
3535

3636

37-
@task("assessment", depends_on=[crawl_tables], job_cluster="tacl")
37+
@task("assessment", job_cluster="tacl")
38+
def setup_tacl(_: MigrationConfig):
39+
"""(Optimization) Starts tacl job cluster in parallel to crawling tables"""
40+
41+
42+
@task("assessment", depends_on=[crawl_tables, setup_tacl], job_cluster="tacl")
3843
def crawl_grants(cfg: MigrationConfig):
3944
"""During this process, our methodology is purposefully designed to systematically scan and retrieve ACLs
4045
(Access Control Lists) associated with Legacy Tables from the Hive Metastore. These ACLs encompass comprehensive

0 commit comments

Comments
 (0)