Skip to content

Commit e701e7b

Browse files
authored
Merge workspace_access.Crawler and workspace_access.Applier interfaces to workspace_access.AclSupport (#436)
This PR simplifies integration testing and removes technical debt. Closes #410
1 parent 59d9218 commit e701e7b

File tree

19 files changed

+309
-306
lines changed

19 files changed

+309
-306
lines changed

docs/local-group-migration.md

Lines changed: 13 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -17,18 +17,12 @@ To deliver this migration, the following steps are performed:
1717

1818
> Please note that inherited permissions will not be inventorized / migrated. We only cover direct permissions.
1919
20-
On a very high-level, the permissions inventorization process is split into two steps:
20+
On a very high-level, the permissions crawling process is split into two steps:
2121

22-
1. collect all existing permissions into a persistent storage.
23-
2. apply the collected permissions to the target resources.
22+
1. collect all existing permissions into a persistent storage - see `workspace_access.AclSupport.get_crawler_tasks`.
23+
2. apply the collected permissions to the target resources - see `workspace_access.AclSupport.get_apply_task`.
2424

25-
The first step is performed by the `Crawler` and the second by the `Applier`.
26-
27-
Crawler and applier are intrinsically connected to each other due to SerDe (serialization/deserialization) logic.
28-
29-
We implement separate crawlers and applier for each supported resource type.
30-
31-
Please note that `table ACLs` logic is currently handled separately from the logic described in this document.
25+
We implement `workspace_access.AclSupport` for each supported resource type.
3226

3327
## Logical objects and relevant APIs
3428

@@ -147,7 +141,9 @@ Additional info:
147141

148142
#### Known issues
149143

150-
- Folder names with forward-slash (`/`) in directory name will be skipped by the inventory. Databricks UI no longer allows creating folders with a forward slash. See [this issue](https://github.com/databrickslabs/ucx/issues/230) for more details.
144+
- Folder names with forward-slash (`/`) in directory name will be skipped by the inventory. Databricks UI no longer
145+
allows creating folders with a forward slash. See [this issue](https://github.com/databrickslabs/ucx/issues/230) for
146+
more details.
151147

152148
### Secrets (uses Secrets API)
153149

@@ -163,16 +159,16 @@ Additional info:
163159
- put method: `ws.secrets.put_acl`
164160

165161

166-
## Crawler and serialization logic
162+
## AclSupport and serialization logic
167163

168164
Crawlers are expected to return a list of callable functions that will be later used to get the permissions.
169-
Each of these functions shall return a `PermissionInventoryItem` that should be serializable into a Delta Table.
165+
Each of these functions shall return a `workspace_access.Permissions` that should be serializable into a Delta Table.
170166
The permission payload differs between different crawlers, therefore each crawler should implement a serialization
171167
method.
172168

173169
## Applier and deserialization logic
174170

175-
Appliers are expected to accept a list of `PermissionInventoryItem` and generate a list of callables that will apply the
171+
Appliers are expected to accept a list of `workspace_access.Permissions` and generate a list of callables that will apply the
176172
given permissions.
177173
Each applier should implement a deserialization method that will convert the raw payload into a typed one.
178174
Each permission item should have a crawler type associated with it, so that the applier can use the correct
@@ -189,10 +185,10 @@ We do this inside the `applier`, by returning a `noop` callable if the object is
189185
To crawl the permissions, we use the following logic:
190186
1. Go through the list of all crawlers.
191187
2. Get the list of all objects of the given type.
192-
3. For each object, generate a callable that will return a `PermissionInventoryItem`.
188+
3. For each object, generate a callable that will return a `workspace_access.Permissions`.
193189
4. Execute the callables in parallel
194-
5. Collect the results into a list of `PermissionInventoryItem`.
195-
6. Save the list of `PermissionInventoryItem` into a Delta Table.
190+
5. Collect the results into a list of `workspace_access.Permissions`.
191+
6. Save the list of `workspace_access.Permissions` into a Delta Table.
196192

197193
## Applying the permissions
198194

src/databricks/labs/ucx/mixins/fixtures.py

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -471,15 +471,14 @@ def create(
471471
if single_node:
472472
kwargs["num_workers"] = 0
473473
if "spark_conf" in kwargs:
474-
kwargs["spark_conf"] = {
475-
**kwargs["spark_conf"],
476-
**{"spark.databricks.cluster.profile": "singleNode", "spark.master": "local[*]"},
474+
kwargs["spark_conf"] = kwargs["spark_conf"] | {
475+
"spark.databricks.cluster.profile": "singleNode",
476+
"spark.master": "local[*]",
477477
}
478478
else:
479479
kwargs["spark_conf"] = {"spark.databricks.cluster.profile": "singleNode", "spark.master": "local[*]"}
480480
kwargs["custom_tags"] = {"ResourceClass": "SingleNode"}
481-
kwargs["node_type_id"] = ws.clusters.select_node_type(local_disk=True)
482-
elif "instance_pool_id" not in kwargs:
481+
if "instance_pool_id" not in kwargs:
483482
kwargs["node_type_id"] = ws.clusters.select_node_type(local_disk=True)
484483

485484
return ws.clusters.create(
Lines changed: 7 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
from abc import abstractmethod
22
from collections.abc import Callable, Iterator
33
from dataclasses import dataclass
4-
from functools import partial
54
from logging import Logger
65
from typing import Literal
76

@@ -21,39 +20,21 @@ class Permissions:
2120
Destination = Literal["backup", "account"]
2221

2322

24-
class Crawler:
23+
class AclSupport:
2524
@abstractmethod
2625
def get_crawler_tasks(self) -> Iterator[Callable[..., Permissions | None]]:
2726
"""
2827
This method should return a list of crawler tasks (e.g. partials or just any callables)
2928
:return:
3029
"""
3130

32-
33-
# TODO: this class has to become typing.Protocol and keep only abstract methods
34-
# See https://www.oreilly.com/library/view/fluent-python-2nd/9781492056348/ch13.html
35-
class Applier:
36-
@abstractmethod
37-
def is_item_relevant(self, item: Permissions, migration_state: GroupMigrationState) -> bool:
38-
"""TODO: remove it, see https://github.com/databrickslabs/ucx/issues/410"""
39-
4031
@abstractmethod
41-
def _get_apply_task(
42-
self, item: Permissions, migration_state: GroupMigrationState, destination: Destination
43-
) -> partial:
44-
"""
45-
This method should return an instance of ApplierTask.
46-
"""
47-
4832
def get_apply_task(
4933
self, item: Permissions, migration_state: GroupMigrationState, destination: Destination
50-
) -> partial:
51-
# we explicitly put the relevance check here to avoid "forgotten implementation" in child classes
52-
if self.is_item_relevant(item, migration_state):
53-
return self._get_apply_task(item, migration_state, destination)
54-
else:
34+
) -> Callable[[], None] | None:
35+
"""This method returns a Callable, that applies permissions to a destination group, based on
36+
the group migration state. The callable is required not to have any shared mutable state."""
5537

56-
def noop():
57-
pass
58-
59-
return partial(noop)
38+
@abstractmethod
39+
def object_types(self) -> set[str]:
40+
"""This method returns a set of strings, that represent object types that are applicable by this instance."""

src/databricks/labs/ucx/workspace_access/generic.py

Lines changed: 54 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
import datetime
12
import json
23
import logging
34
from collections.abc import Callable, Iterator
@@ -11,8 +12,7 @@
1112

1213
from databricks.labs.ucx.mixins.hardening import rate_limited
1314
from databricks.labs.ucx.workspace_access.base import (
14-
Applier,
15-
Crawler,
15+
AclSupport,
1616
Destination,
1717
Permissions,
1818
)
@@ -31,17 +31,49 @@ class RetryableError(DatabricksError):
3131
pass
3232

3333

34-
class GenericPermissionsSupport(Crawler, Applier):
35-
def __init__(self, ws: WorkspaceClient, listings: list[Callable[..., Iterator[GenericPermissionsInfo]]]):
34+
class Listing:
35+
def __init__(self, func: Callable[..., list], id_attribute: str, object_type: str):
36+
self._func = func
37+
self._id_attribute = id_attribute
38+
self._object_type = object_type
39+
40+
def object_types(self) -> set[str]:
41+
return {self._object_type}
42+
43+
def __iter__(self):
44+
started = datetime.datetime.now()
45+
for item in self._func():
46+
yield GenericPermissionsInfo(getattr(item, self._id_attribute), self._object_type)
47+
since = datetime.datetime.now() - started
48+
logger.info(f"Listed {self._object_type} in {since}")
49+
50+
51+
class GenericPermissionsSupport(AclSupport):
52+
def __init__(self, ws: WorkspaceClient, listings: list[Listing]):
3653
self._ws = ws
3754
self._listings = listings
3855

3956
def get_crawler_tasks(self):
4057
for listing in self._listings:
41-
for info in listing():
58+
for info in listing:
4259
yield partial(self._crawler_task, info.request_type, info.object_id)
4360

44-
def is_item_relevant(self, item: Permissions, migration_state: GroupMigrationState) -> bool:
61+
def object_types(self) -> set[str]:
62+
all_object_types = set()
63+
for listing in self._listings:
64+
for object_type in listing.object_types():
65+
all_object_types.add(object_type)
66+
return all_object_types
67+
68+
def get_apply_task(self, item: Permissions, migration_state: GroupMigrationState, destination: Destination):
69+
if not self._is_item_relevant(item, migration_state):
70+
return None
71+
object_permissions = iam.ObjectPermissions.from_dict(json.loads(item.raw))
72+
new_acl = self._prepare_new_acl(object_permissions, migration_state, destination)
73+
return partial(self._applier_task, item.object_type, item.object_id, new_acl)
74+
75+
@staticmethod
76+
def _is_item_relevant(item: Permissions, migration_state: GroupMigrationState) -> bool:
4577
# passwords and tokens are represented on the workspace-level
4678
if item.object_id in ("tokens", "passwords"):
4779
return True
@@ -50,14 +82,6 @@ def is_item_relevant(self, item: Permissions, migration_state: GroupMigrationSta
5082
]
5183
return any(g in mentioned_groups for g in [info.workspace.display_name for info in migration_state.groups])
5284

53-
def _get_apply_task(
54-
self, item: Permissions, migration_state: GroupMigrationState, destination: Destination
55-
) -> partial:
56-
new_acl = self._prepare_new_acl(
57-
iam.ObjectPermissions.from_dict(json.loads(item.raw)), migration_state, destination
58-
)
59-
return partial(self._applier_task, item.object_type, item.object_id, new_acl)
60-
6185
@rate_limited(max_requests=30)
6286
def _applier_task(self, object_type: str, object_id: str, acl: list[iam.AccessControlRequest]):
6387
self._ws.permissions.update(object_type, object_id, access_control_list=acl)
@@ -121,20 +145,17 @@ def _prepare_new_acl(
121145
return acl_requests
122146

123147

124-
def listing_wrapper(
125-
func: Callable[..., list], id_attribute: str, object_type: str
126-
) -> Callable[..., Iterator[GenericPermissionsInfo]]:
127-
def wrapper() -> Iterator[GenericPermissionsInfo]:
128-
for item in func():
129-
yield GenericPermissionsInfo(
130-
object_id=getattr(item, id_attribute),
131-
request_type=object_type,
132-
)
133-
134-
return wrapper
148+
class WorkspaceListing(Listing):
149+
def __init__(self, ws: WorkspaceClient, num_threads=20, start_path: str | None = "/"):
150+
super().__init__(..., ..., ...)
151+
self._ws = ws
152+
self._num_threads = num_threads
153+
self._start_path = start_path
135154

155+
def object_types(self) -> set[str]:
156+
return {"notebooks", "directories", "repos", "files"}
136157

137-
def workspace_listing(ws: WorkspaceClient, num_threads=20, start_path: str | None = "/"):
158+
@staticmethod
138159
def _convert_object_type_to_request_type(_object: workspace.ObjectInfo) -> str | None:
139160
match _object.object_type:
140161
case workspace.ObjectType.NOTEBOOK:
@@ -151,17 +172,15 @@ def _convert_object_type_to_request_type(_object: workspace.ObjectInfo) -> str |
151172
case None:
152173
return None
153174

154-
def inner():
175+
def __iter__(self):
155176
from databricks.labs.ucx.workspace_access.listing import WorkspaceListing
156177

157-
ws_listing = WorkspaceListing(ws, num_threads=num_threads, with_directories=False)
158-
for _object in ws_listing.walk(start_path):
159-
request_type = _convert_object_type_to_request_type(_object)
178+
ws_listing = WorkspaceListing(self._ws, num_threads=self._num_threads, with_directories=False)
179+
for _object in ws_listing.walk(self._start_path):
180+
request_type = self._convert_object_type_to_request_type(_object)
160181
if request_type:
161182
yield GenericPermissionsInfo(object_id=str(_object.object_id), request_type=request_type)
162183

163-
return inner
164-
165184

166185
def models_listing(ws: WorkspaceClient):
167186
def inner() -> Iterator[ml.ModelDatabricks]:
@@ -193,12 +212,6 @@ def inner() -> Iterator[ml.Experiment]:
193212
return inner
194213

195214

196-
def authorization_listing():
197-
def inner():
198-
for _value in ["passwords", "tokens"]:
199-
yield GenericPermissionsInfo(
200-
object_id=_value,
201-
request_type="authorization",
202-
)
203-
204-
return inner
215+
def tokens_and_passwords():
216+
for _value in ["passwords", "tokens"]:
217+
yield GenericPermissionsInfo(_value, "authorization")

src/databricks/labs/ucx/workspace_access/groups.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ def is_in_scope(self, attr: str, group: Group) -> bool:
4242
return False
4343

4444
def get_by_workspace_group_name(self, workspace_group_name: str) -> MigrationGroupInfo | None:
45+
# TODO: this method is deprecated, replace all usages by get_target_principal()
4546
found = [g for g in self.groups if g.workspace.display_name == workspace_group_name]
4647
if len(found) == 0:
4748
return None

0 commit comments

Comments
 (0)