Skip to content

Commit 001f7e3

Browse files
authored
Adopt installation package from databricks-labs-blueprint (#860)
1. The `pyproject.toml` file, which is used to manage the dependencies and metadata of the project, has been updated to upgrade the version of the `databricks-labs-blueprint` dependency from `0.2.2` to `0.2.4`. 2. The pull request also introduces changes to the `account.py` file in the `databricks.labs.ucx` package. The changes include the addition of a new `Installation` class from the `databricks.labs.blueprint.installation` module and the removal of the `InstallationManager` class. The `AccountWorkspaces` class has been updated to accept an `Installation` object instead of an `AccountConfig` object, and the `sync_workspace_info` method has been updated to use the `Installation.save` method instead of writing the workspace info to a JSON file. 3. The `aws.py` and `azure.py` files in the `databricks.labs.ucx.assessment` package have also been updated to use the `Installation` object instead of the `WorkspaceClient` object. The `AzureResourcePermissions` and `AWSResourcePermissions` classes have been updated to accept an `Installation` object and a `WorkspaceClient` object as arguments, and the `save_spn_permissions` and `save_instance_profile_permissions` methods have been updated to use the `Installation.save` method instead of writing the permission info to a CSV file. 4. The `cli.py` file in the `databricks.labs.ucx` package has been updated to use the `Installation` object instead of the `InstallationManager` object. The `installations` command has been updated to use the `Installation.existing` method instead of the `InstallationManager.user_installations` method. The `skip` command has been updated to use the `TableMapping.current` method instead of the `TableMapping.for_cli` method. The `sync_workspace_info` command has been updated to use the `AccountWorkspaces.sync_workspace_info` method instead of the `AccountWorkspaces.sync_workspace_info` method. The `create_table_mapping` command has been updated to use the `TableMapping.current` method instead of the `TableMapping.for_cli` method. The `validate_external_locations` command has been updated to use the `ExternalLocations.save_as_terraform_definitions_on_workspace` method instead of the `ExternalLocations.save_as_terraform_definitions_on_workspace` method. The `ensure_assessment_run` command has been updated to use the `WorkspaceInstallation.validate_and_run` method instead of the `WorkspaceInstaller.validate_and_run` method. The `repair_run` command has been updated to use the `WorkspaceInstallation.repair_run` method instead of the `WorkspaceInstaller.repair_run` method. The `validate_groups_membership` command has been updated to use the `GroupManager.validate_group_membership` method instead of the `GroupManager.validate_group_membership` method. The `revert_migrated_tables` command has been updated to use the `TablesMigrate.revert_migrated_tables` method instead of the `TablesMigrate.revert_migrated_tables` method. The `move` and `alias` commands have been updated to use the `TableMove.for_cli` method instead of the `TableMove.for_cli` method. The `save_azure_storage_accounts` command has been updated to use the `AzureResourcePermissions.for_cli` method instead of the `AzureResourcePermissions.for_cli` method. The `save_aws_iam_profiles` command has been updated to use the `AWSResourcePermissions.for_cli` method instead of the `AWSResourcePermissions.for_cli` method. 5. The `config.py` file in the `databricks.labs.ucx` package has been updated to add a new `WorkspaceConfig` class and remove the `AccountConfig` class. The `WorkspaceConfig` class has been updated to include a new `connect` field, which is used to store the configuration of the Databricks workspace. The `AccountConfig.from_dict` and `WorkspaceConfig.from_dict` methods have been updated to migrate the configuration from the old format to the new format. The `AccountConfig.to_account_client` and `WorkspaceConfig.to_workspace_client` methods have been removed, as they are no longer needed. 6. The `crawlers.py` file in the `databricks.labs.ucx.framework` package has been updated to add a new `DatabricksError` class, which is used to raise exceptions when an error occurs during the execution of a SQL statement. The `RuntimeBackend` and `SqlBackend` classes have been updated to accept a new `debug_truncate_bytes` argument, which is used to specify the maximum number of bytes to be printed when logging a SQL statement. The `SqlBackend.execute` and `SqlBackend.fetch` methods have been updated to raise a `DatabricksError` exception if an error occurs during the execution of a SQL statement. The `CrawlerBase` class has been updated to accept a new `SqlBackend` argument, which is used to execute SQL statements. 7. The `dashboards.py` file in the `databricks.labs.ucx.framework` package has been updated to add a new `Task` class, which is used to define the tasks that can be executed by the `trigger` command. The `Task` class has been updated to include a new `fn` field, which is used to specify the function that should be executed when the task is run. The `trigger` command has been updated to use the `Task.fn` field instead of the `Task.fn` field. 8. The `locations.py` file in the `databricks.labs.ucx.hive_metastore` package has been updated to add a new `Installation` class from the `databricks.labs.blueprint.installation` module and remove the `WorkspaceInfo` class. The `ExternalLocations` class has been updated to accept an `Installation` object instead of a `WorkspaceClient` object. The `save_as_terraform_definitions_on_workspace` method has been updated to use the `Installation.upload` method instead of writing the external location definitions to a file. 9. The `mapping.py` file in the `databricks.labs.ucx.hive_metastore` package has been updated to add a new `Installation` class from the `databricks.labs.blueprint.installation` module and remove the `WorkspaceInfo` class. The `TableMapping` class has been updated to accept an `Installation` object and a `WorkspaceClient` object as arguments. The `save` method has been updated to use the `Installation.save` method instead of writing the mapping to a CSV file. The `load` method has been updated to use the `Installation.load` method instead of reading the mapping from a CSV file. The `skip_table` and `skip_schema` methods have been updated to use the `Installation.execute` method instead of executing SQL statements directly. The `get_tables_to_migrate` method has been updated to use the `Installation.execute` method instead of executing SQL statements directly. In summary, this pull request introduces several changes to the Databricks Labs project, including updates to the `labs.yml` and `pyproject.toml` files, the addition of a new `Installation` class, the removal of the `InstallationManager` class, the update of several classes to use the `Installation` object instead of the `WorkspaceClient` object, and the addition of a new `Task` class. These changes aim to improve the usability and maintainability of the project.
1 parent 3042857 commit 001f7e3

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

41 files changed

+1594
-2844
lines changed

labs.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,8 +22,8 @@ commands:
2222
- name: installations
2323
description: Show installations by different users on the same workspace
2424
table_template: |-
25-
User\tDatabase\tWarehouse
26-
{{range .}}{{.user_name}}\t{{.database}}\t{{.warehouse_id}}
25+
Path\tDatabase\tWarehouse
26+
{{range .}}{{.path}}\t{{.database}}\t{{.warehouse_id}}
2727
{{end}}
2828
2929
- name: skip

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ classifiers = [
2727
"Programming Language :: Python :: Implementation :: CPython",
2828
]
2929
dependencies = ["databricks-sdk~=0.18.0",
30-
"databricks-labs-blueprint~=0.2.2",
30+
"databricks-labs-blueprint~=0.2.4",
3131
"PyYAML>=6.0.0,<7.0.0"]
3232

3333
[project.entry-points.databricks]

src/databricks/labs/ucx/account.py

Lines changed: 21 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,14 @@
1-
import json
21
import logging
32
from typing import ClassVar
43

54
import requests
5+
from databricks.labs.blueprint.installation import Installation
66
from databricks.labs.blueprint.tui import Prompts
7-
from databricks.sdk import WorkspaceClient
7+
from databricks.sdk import AccountClient, WorkspaceClient
88
from databricks.sdk.errors import NotFound
99
from databricks.sdk.service.provisioning import Workspace
10-
from databricks.sdk.service.workspace import ImportFormat
1110

1211
from databricks.labs.ucx.__about__ import __version__
13-
from databricks.labs.ucx.config import AccountConfig
14-
from databricks.labs.ucx.installer import InstallationManager
1512

1613
logger = logging.getLogger(__name__)
1714

@@ -25,24 +22,12 @@ class AccountWorkspaces:
2522

2623
SYNC_FILE_NAME: ClassVar[str] = "workspaces.json"
2724

28-
def __init__(
29-
self, cfg: AccountConfig, new_workspace_client=WorkspaceClient, new_installation_manager=InstallationManager
30-
):
25+
def __init__(self, ac: AccountClient, new_workspace_client=WorkspaceClient):
3126
self._new_workspace_client = new_workspace_client
32-
self._new_installation_manager = new_installation_manager
33-
self._ac = cfg.to_account_client()
34-
self._cfg = cfg
35-
36-
def _configured_workspaces(self):
37-
for workspace in self._ac.workspaces.list():
38-
if self._cfg.include_workspace_names:
39-
if workspace.workspace_name not in self._cfg.include_workspace_names:
40-
logger.debug(
41-
f"skipping {workspace.workspace_name} ({workspace.workspace_id} because "
42-
f"its not explicitly included"
43-
)
44-
continue
45-
yield workspace
27+
self._ac = ac
28+
29+
def _workspaces(self):
30+
return self._ac.workspaces.list()
4631

4732
def _get_cloud(self) -> str:
4833
if self._ac.config.is_azure:
@@ -66,7 +51,7 @@ def workspace_clients(self) -> list[WorkspaceClient]:
6651
:return: list[WorkspaceClient]
6752
"""
6853
clients = []
69-
for workspace in self._configured_workspaces():
54+
for workspace in self._workspaces():
7055
ws = self.client_for(workspace)
7156
clients.append(ws)
7257
return clients
@@ -78,23 +63,17 @@ def sync_workspace_info(self):
7863
upload the json dump of workspace info in the .ucx folder
7964
"""
8065
workspaces = []
81-
for workspace in self._configured_workspaces():
82-
workspaces.append(workspace.as_dict())
83-
info = json.dumps(workspaces, indent=2).encode("utf8")
66+
for workspace in self._workspaces():
67+
workspaces.append(workspace)
8468
for ws in self.workspace_clients():
85-
installation_manager = self._new_installation_manager(ws)
86-
for installation in installation_manager.user_installations():
87-
path = f"{installation.path}/{self.SYNC_FILE_NAME}"
88-
ws.workspace.upload(path, info, overwrite=True, format=ImportFormat.AUTO)
69+
for installation in Installation.existing(ws, "ucx"):
70+
installation.save(workspaces, filename=self.SYNC_FILE_NAME)
8971

9072

9173
class WorkspaceInfo:
92-
def __init__(self, ws: WorkspaceClient, folder: str | None = None, new_installation_manager=InstallationManager):
93-
if not folder:
94-
folder = f"/Users/{ws.current_user.me().user_name}/.ucx"
74+
def __init__(self, installation: Installation, ws: WorkspaceClient):
75+
self._installation = installation
9576
self._ws = ws
96-
self._folder = folder
97-
self._new_installation_manager = new_installation_manager
9877

9978
def _current_workspace_id(self) -> int:
10079
headers = self._ws.config.authenticate()
@@ -109,12 +88,10 @@ def _current_workspace_id(self) -> int:
10988
def _load_workspace_info(self) -> dict[int, Workspace]:
11089
try:
11190
id_to_workspace = {}
112-
with self._ws.workspace.download(f"{self._folder}/{AccountWorkspaces.SYNC_FILE_NAME}") as f:
113-
for workspace_metadata in json.loads(f.read()):
114-
workspace = Workspace.from_dict(workspace_metadata)
115-
assert workspace.workspace_id is not None
116-
id_to_workspace[workspace.workspace_id] = workspace
117-
return id_to_workspace
91+
for workspace in self._installation.load(list[Workspace], filename=AccountWorkspaces.SYNC_FILE_NAME):
92+
assert workspace.workspace_id is not None
93+
id_to_workspace[workspace.workspace_id] = workspace
94+
return id_to_workspace
11895
except NotFound:
11996
msg = "Please run as account-admin: databricks labs ucx sync-workspace-info"
12097
raise ValueError(msg) from None
@@ -142,17 +119,11 @@ def manual_workspace_info(self, prompts: Prompts):
142119
workspace_name = prompts.question(
143120
f"Workspace name for {workspace_id}", default=f"workspace-{workspace_id}", valid_regex=r"^[\w-]+$"
144121
)
145-
workspace = Workspace(workspace_id=int(workspace_id), workspace_name=workspace_name)
146-
workspaces.append(workspace.as_dict())
122+
workspaces.append(Workspace(workspace_id=int(workspace_id), workspace_name=workspace_name))
147123
answer = prompts.question("Next workspace id", valid_number=True, default="stop")
148124
if answer == "stop":
149125
break
150126
workspace_id = int(answer)
151-
info = json.dumps(workspaces, indent=2).encode("utf8")
152-
installation_manager = self._new_installation_manager(self._ws)
153-
logger.info("Detecting UCX installations on current workspace...")
154-
for installation in installation_manager.user_installations():
155-
path = f"{installation.path}/{AccountWorkspaces.SYNC_FILE_NAME}"
156-
logger.info(f"Overwriting {path}")
157-
self._ws.workspace.upload(path, info, overwrite=True, format=ImportFormat.AUTO) # type: ignore[arg-type]
127+
for installation in Installation.existing(self._ws, 'ucx'):
128+
installation.save(workspaces, filename=AccountWorkspaces.SYNC_FILE_NAME)
158129
logger.info("Synchronised workspace id mapping for installations on current workspace")

src/databricks/labs/ucx/assessment/aws.py

Lines changed: 12 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,3 @@
1-
import csv
2-
import dataclasses
3-
import io
41
import json
52
import logging
63
import re
@@ -11,10 +8,10 @@
118
from dataclasses import dataclass
129
from functools import lru_cache, partial
1310

11+
from databricks.labs.blueprint.installation import Installation
1412
from databricks.labs.blueprint.parallel import Threads
1513
from databricks.sdk import WorkspaceClient
1614
from databricks.sdk.service.catalog import Privilege
17-
from databricks.sdk.service.workspace import ImportFormat
1815

1916
logger = logging.getLogger(__name__)
2017

@@ -172,18 +169,18 @@ def _run_json_command(self, command: str):
172169

173170

174171
class AWSResourcePermissions:
175-
def __init__(
176-
self,
177-
ws: WorkspaceClient,
178-
aws_resources: AWSResources,
179-
folder: str | None = None,
180-
):
181-
if not folder:
182-
folder = f"/Users/{ws.current_user.me().user_name}/.ucx"
183-
self._folder = folder
172+
def __init__(self, installation: Installation, ws: WorkspaceClient, aws_resources: AWSResources):
173+
self._installation = installation
184174
self._aws_resources = aws_resources
185175
self._ws = ws
186-
self._field_names = [_.name for _ in dataclasses.fields(AWSInstanceProfileAction)]
176+
177+
@classmethod
178+
def for_cli(cls, ws: WorkspaceClient, aws_profile, product='ucx'):
179+
installation = Installation.current(ws, product)
180+
aws = AWSResources(aws_profile)
181+
if not aws.validate_connection():
182+
raise ResourceWarning("AWS CLI is not configured properly.")
183+
return cls(installation, ws, aws)
187184

188185
def _get_instance_profiles(self) -> Iterable[AWSInstanceProfile]:
189186
instance_profiles = self._ws.instance_profiles.list()
@@ -240,18 +237,4 @@ def save_instance_profile_permissions(self) -> str | None:
240237
if len(instance_profile_access) == 0:
241238
logger.warning("No Mapping Was Generated.")
242239
return None
243-
return self._save(instance_profile_access)
244-
245-
def _overwrite_mapping(self, buffer) -> str:
246-
path = f"{self._folder}/aws_instance_profile_info.csv"
247-
self._ws.workspace.upload(path, buffer, overwrite=True, format=ImportFormat.AUTO)
248-
return path
249-
250-
def _save(self, instance_profile_actions: list[AWSInstanceProfileAction]) -> str:
251-
buffer = io.StringIO()
252-
writer = csv.DictWriter(buffer, self._field_names)
253-
writer.writeheader()
254-
for instance_profile_action in instance_profile_actions:
255-
writer.writerow(dataclasses.asdict(instance_profile_action))
256-
buffer.seek(0)
257-
return self._overwrite_mapping(buffer)
240+
return self._installation.save(instance_profile_access, filename='aws_instance_profile_info.csv')

src/databricks/labs/ucx/assessment/azure.py

Lines changed: 20 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,10 @@
11
import base64
2-
import csv
3-
import dataclasses
4-
import io
52
import json
63
import re
74
from collections.abc import Iterable
85
from dataclasses import dataclass
96

7+
from databricks.labs.blueprint.installation import Installation
108
from databricks.sdk import WorkspaceClient
119
from databricks.sdk.core import (
1210
ApiClient,
@@ -17,7 +15,6 @@
1715
from databricks.sdk.errors import NotFound
1816
from databricks.sdk.service.catalog import Privilege
1917
from databricks.sdk.service.compute import ClusterSource, Policy
20-
from databricks.sdk.service.workspace import ImportFormat
2118

2219
from databricks.labs.ucx.assessment.crawlers import (
2320
_CLIENT_ENDPOINT_LENGTH,
@@ -28,7 +25,12 @@
2825
logger,
2926
)
3027
from databricks.labs.ucx.assessment.jobs import JobsMixin
31-
from databricks.labs.ucx.framework.crawlers import CrawlerBase, SqlBackend
28+
from databricks.labs.ucx.config import WorkspaceConfig
29+
from databricks.labs.ucx.framework.crawlers import (
30+
CrawlerBase,
31+
SqlBackend,
32+
StatementExecutionBackend,
33+
)
3234
from databricks.labs.ucx.hive_metastore.locations import ExternalLocations
3335

3436

@@ -461,20 +463,27 @@ class StoragePermissionMapping:
461463

462464

463465
class AzureResourcePermissions:
464-
def __init__(self, ws: WorkspaceClient, azurerm: AzureResources, lc: ExternalLocations, folder: str | None = None):
466+
def __init__(self, installation: Installation, ws: WorkspaceClient, azurerm: AzureResources, lc: ExternalLocations):
467+
self._filename = 'azure_storage_account_info.csv'
468+
self._installation = installation
465469
self._locations = lc
466470
self._azurerm = azurerm
467471
self._ws = ws
468-
self._field_names = [_.name for _ in dataclasses.fields(StoragePermissionMapping)]
469-
if not folder:
470-
folder = f"/Users/{ws.current_user.me().user_name}/.ucx"
471-
self._folder = folder
472472
self._levels = {
473473
"Storage Blob Data Contributor": Privilege.WRITE_FILES,
474474
"Storage Blob Data Owner": Privilege.WRITE_FILES,
475475
"Storage Blob Data Reader": Privilege.READ_FILES,
476476
}
477477

478+
@classmethod
479+
def for_cli(cls, ws: WorkspaceClient, product='ucx'):
480+
installation = Installation.current(ws, product)
481+
config = installation.load(WorkspaceConfig)
482+
sql_backend = StatementExecutionBackend(ws, config.warehouse_id)
483+
azurerm = AzureResources(ws)
484+
locations = ExternalLocations(ws, sql_backend, config.inventory_database)
485+
return cls(installation, ws, azurerm, locations)
486+
478487
def _map_storage(self, storage: AzureResource) -> list[StoragePermissionMapping]:
479488
logger.info(f"Fetching role assignment for {storage.storage_account}")
480489
out = []
@@ -512,21 +521,7 @@ def save_spn_permissions(self) -> str | None:
512521
if len(storage_account_infos) == 0:
513522
logger.error("No storage account found in current tenant with spn permission")
514523
return None
515-
return self._save(storage_account_infos)
516-
517-
def _save(self, storage_infos: list[StoragePermissionMapping]) -> str:
518-
buffer = io.StringIO()
519-
writer = csv.DictWriter(buffer, self._field_names)
520-
writer.writeheader()
521-
for storage_info in storage_infos:
522-
writer.writerow(dataclasses.asdict(storage_info))
523-
buffer.seek(0)
524-
return self._overwrite_mapping(buffer)
525-
526-
def _overwrite_mapping(self, buffer) -> str:
527-
path = f"{self._folder}/azure_storage_account_info.csv"
528-
self._ws.workspace.upload(path, buffer, overwrite=True, format=ImportFormat.AUTO)
529-
return path
524+
return self._installation.save(storage_account_infos, filename=self._filename)
530525

531526
def _get_storage_accounts(self) -> list[str]:
532527
external_locations = self._locations.snapshot()

0 commit comments

Comments
 (0)