Skip to content

Commit f4efea5

Browse files
Add a command to create account level groups if they do not exist (#763)
Attempt to fix - #17 - #649 Adds a command to create groups at account level by crawling all workspaces configured in the account and in scope of the migration This pull request adds several new methods to the `account.py` file in the `databricks/labs/ucx` directory. The main method added is `create_account_level_groups`, which crawls all workspaces in an account and creates account-level groups if a workspace-local group is not present in the account. The method `get_valid_workspaces_groups` is added to retrieve a dictionary of all valid workspace groups, while `has_not_same_members` checks if two groups have the same members. The method `get_account_groups` retrieves a dictionary of all account groups. Regarding the tests, the `test_account.py` file has been updated to include new tests for the `create_account_level_groups` method. The test `test_create_acc_groups_should_create_acc_group_if_no_group_found` verifies that an account-level group is created if no group with the same name is found. The test `test_create_acc_groups_should_filter_groups_in_other_workspaces` checks that the method filters groups present in other workspaces and only creates groups that are not present in the account. Additionally, the `cli.py` file has been updated to include a new command, `create_account_level_groups`, which uploads workspace config to all workspaces in the account where ucx is installed.
1 parent 76bdff9 commit f4efea5

File tree

7 files changed

+627
-10
lines changed

7 files changed

+627
-10
lines changed

README.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ See [contributing instructions](CONTRIBUTING.md) to help improve this project.
2626
* [Producing table mapping](#producing-table-mapping)
2727
* [Synchronising UCX configurations](#synchronising-ucx-configurations)
2828
* [Validating group membership](#validating-group-membership)
29+
* [Creating account groups](#creating-account-groups)
2930
* [Star History](#star-history)
3031
* [Project Support](#project-support)
3132
<!-- TOC -->
@@ -197,6 +198,21 @@ Use to validate workspace-level & account-level groups to identify any discrepan
197198
databricks labs ucx validate-groups-membership
198199
```
199200

201+
### Creating account groups
202+
Crawl all workspaces configured in workspace_ids, then creates account level groups if a WS local group is not present
203+
in the account.
204+
If workspace_ids is not specified, it will create account groups for all workspaces configured in the account.
205+
206+
The following scenarios are supported, if a group X:
207+
- Exist in workspaces A,B,C and it has same members in there, it will be created in the account
208+
- Exist in workspaces A,B but not in C, it will be created in the account
209+
- Exist in workspaces A,B,C. It has same members in A,B, but not in C. Then, X and C_X will be created in the
210+
account
211+
212+
```commandline
213+
databricks labs ucx create-account-groups --workspace_ids <comma separated list of workspace id>
214+
```
215+
200216
## Star History
201217

202218
[![Star History Chart](https://api.star-history.com/svg?repos=databrickslabs/ucx&type=Date)](https://star-history.com/#databrickslabs/ucx)

labs.yml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -117,3 +117,12 @@ commands:
117117
118118
- name: migrate_credentials
119119
description: Migrate credentials for storage access to UC storage credential
120+
121+
- name: create-account-groups
122+
is_account_level: true
123+
description: |
124+
Creates account level groups for all groups in workspaces provided in workspace_ids.
125+
If workspace_ids is not provided, it will use all workspaces present in the account.
126+
flags:
127+
- name: workspace_ids
128+
description: List of workspace IDs to create account groups from.

src/databricks/labs/ucx/account.py

Lines changed: 130 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,9 @@
66
from databricks.labs.blueprint.tui import Prompts
77
from databricks.sdk import AccountClient, WorkspaceClient
88
from databricks.sdk.errors import NotFound
9+
from databricks.sdk.service.iam import ComplexValue, Group, Patch, PatchOp, PatchSchema
910
from databricks.sdk.service.provisioning import Workspace
1011

11-
from databricks.labs.ucx.__about__ import __version__
12-
1312
logger = logging.getLogger(__name__)
1413

1514

@@ -37,13 +36,7 @@ def _get_cloud(self) -> str:
3736
return "aws"
3837

3938
def client_for(self, workspace: Workspace) -> WorkspaceClient:
40-
config = self._ac.config.as_dict()
41-
if "databricks_cli_path" in config:
42-
del config["databricks_cli_path"]
43-
cloud = self._get_cloud()
44-
# copy current config and swap with a host relevant to a workspace
45-
config["host"] = f"https://{workspace.deployment_name}.{self._tlds[cloud]}"
46-
return self._new_workspace_client(**config, product="ucx", product_version=__version__)
39+
return self._ac.get_workspace_client(workspace)
4740

4841
def workspace_clients(self) -> list[WorkspaceClient]:
4942
"""
@@ -69,6 +62,134 @@ def sync_workspace_info(self):
6962
for installation in Installation.existing(ws, "ucx"):
7063
installation.save(workspaces, filename=self.SYNC_FILE_NAME)
7164

65+
def create_account_level_groups(self, prompts: Prompts, workspace_ids: list[int] | None = None):
66+
acc_groups = self._get_account_groups()
67+
workspace_ids = self._get_valid_workspaces_ids(workspace_ids)
68+
all_valid_workspace_groups = self._get_valid_workspaces_groups(prompts, workspace_ids)
69+
70+
for group_name, valid_group in all_valid_workspace_groups.items():
71+
if group_name in acc_groups:
72+
logger.info(f"Group {group_name} already exist in the account, ignoring")
73+
continue
74+
75+
acc_group = self._ac.groups.create(display_name=group_name)
76+
77+
if not valid_group.members or not acc_group.id:
78+
continue
79+
if len(valid_group.members) > 0:
80+
self._add_members_to_acc_group(self._ac, acc_group.id, group_name, valid_group)
81+
logger.info(f"Group {group_name} created in the account")
82+
83+
def _get_valid_workspaces_ids(self, workspace_ids: list[int] | None = None) -> list[int]:
84+
if not workspace_ids:
85+
logger.info("No workspace ids provided, using current workspace instead")
86+
return [self._new_workspace_client().get_workspace_id()]
87+
88+
all_workspace_ids = [workspace.workspace_id for workspace in self._workspaces()]
89+
90+
valid_workspace_ids = []
91+
for workspace_id in workspace_ids:
92+
if workspace_id in all_workspace_ids:
93+
valid_workspace_ids.append(workspace_id)
94+
else:
95+
logger.info(f"Workspace id {workspace_id} not found on the account")
96+
97+
if not valid_workspace_ids:
98+
raise ValueError("No workspace ids provided in the configuration found in the account")
99+
100+
workspace_ids_str = ','.join(str(x) for x in valid_workspace_ids)
101+
logger.info(f"Creating account groups for workspaces IDs : {workspace_ids_str}")
102+
return valid_workspace_ids
103+
104+
def _add_members_to_acc_group(
105+
self, acc_client: AccountClient, acc_group_id: str, group_name: str, valid_group: Group
106+
):
107+
for chunk in self._chunks(valid_group.members, 20):
108+
logger.debug(f"Adding {len(chunk)} members to acc group {group_name}")
109+
acc_client.groups.patch(
110+
acc_group_id,
111+
operations=[Patch(op=PatchOp.ADD, path="members", value=[x.as_dict() for x in chunk])],
112+
schemas=[PatchSchema.URN_IETF_PARAMS_SCIM_API_MESSAGES_2_0_PATCH_OP],
113+
)
114+
115+
def _chunks(self, lst, chunk_size):
116+
"""Yield successive n-sized chunks from lst."""
117+
for i in range(0, len(lst), chunk_size):
118+
yield lst[i : i + chunk_size]
119+
120+
def _get_valid_workspaces_groups(self, prompts: Prompts, workspace_ids: list[int]) -> dict[str, Group]:
121+
all_workspaces_groups: dict[str, Group] = {}
122+
123+
for workspace in self._workspaces():
124+
if workspace.workspace_id not in workspace_ids:
125+
continue
126+
client = self.client_for(workspace)
127+
logger.info(f"Crawling groups in workspace {client.config.host}")
128+
129+
ws_group_ids = client.groups.list(attributes="id")
130+
for group_id in ws_group_ids:
131+
if not group_id.id:
132+
continue
133+
134+
full_workspace_group = client.groups.get(group_id.id)
135+
group_name = full_workspace_group.display_name
136+
137+
if self._is_group_out_of_scope(full_workspace_group):
138+
continue
139+
140+
if group_name in all_workspaces_groups:
141+
if self._has_same_members(all_workspaces_groups[group_name], full_workspace_group):
142+
logger.info(f"Workspace group {group_name} already found, ignoring")
143+
continue
144+
145+
if prompts.confirm(
146+
f"Group {group_name} does not have the same amount of members "
147+
f"in workspace {client.config.host} than previous workspaces which contains the same group name,"
148+
f"it will be created at the account with name : {workspace.workspace_name}_{group_name}"
149+
):
150+
all_workspaces_groups[f"{workspace.workspace_name}_{group_name}"] = full_workspace_group
151+
continue
152+
153+
if not group_name:
154+
continue
155+
156+
logger.info(f"Found new group {group_name}")
157+
all_workspaces_groups[group_name] = full_workspace_group
158+
159+
logger.info(f"Found a total of {len(all_workspaces_groups)} groups to migrate to the account")
160+
161+
return all_workspaces_groups
162+
163+
def _is_group_out_of_scope(self, group: Group) -> bool:
164+
if group.display_name in {"users", "admins", "account users"}:
165+
logger.debug(f"Group {group.display_name} is a system group, ignoring")
166+
return True
167+
meta = group.meta
168+
if not meta:
169+
return False
170+
if meta.resource_type != "WorkspaceGroup":
171+
logger.debug(f"Group {group.display_name} is an account group, ignoring")
172+
return True
173+
return False
174+
175+
def _has_same_members(self, group_1: Group, group_2: Group) -> bool:
176+
ws_members_set_1 = set([m.display for m in group_1.members] if group_1.members else [])
177+
ws_members_set_2 = set([m.display for m in group_2.members] if group_2.members else [])
178+
return not bool((ws_members_set_1 - ws_members_set_2).union(ws_members_set_2 - ws_members_set_1))
179+
180+
def _get_account_groups(self) -> dict[str | None, list[ComplexValue] | None]:
181+
logger.debug("Listing groups in account")
182+
acc_groups = {}
183+
for acc_grp_id in self._ac.groups.list(attributes="id"):
184+
if not acc_grp_id.id:
185+
continue
186+
full_account_group = self._ac.groups.get(acc_grp_id.id)
187+
logger.debug(f"Found account group {acc_grp_id.display_name}")
188+
acc_groups[full_account_group.display_name] = full_account_group.members
189+
190+
logger.info(f"{len(acc_groups)} account groups found")
191+
return acc_groups
192+
72193

73194
class WorkspaceInfo:
74195
def __init__(self, installation: Installation, ws: WorkspaceClient):

src/databricks/labs/ucx/cli.py

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -90,6 +90,25 @@ def sync_workspace_info(a: AccountClient):
9090
workspaces.sync_workspace_info()
9191

9292

93+
@ucx.command(is_account=True)
94+
def create_account_groups(a: AccountClient, workspace_ids: list[int] | None = None):
95+
"""
96+
Crawl all workspaces configured in workspace_ids, then creates account level groups if a WS local group is not present
97+
in the account.
98+
If workspace_ids is not specified, it will create account groups for all workspaces configured in the account.
99+
100+
The following scenarios are supported, if a group X:
101+
- Exist in workspaces A,B,C and it has same members in there, it will be created in the account
102+
- Exist in workspaces A,B but not in C, it will be created in the account
103+
- Exist in workspaces A,B,C. It has same members in A,B, but not in C. Then, X and C_X will be created in the
104+
account
105+
"""
106+
logger.info(f"Account ID: {a.config.account_id}")
107+
prompts = Prompts()
108+
workspaces = AccountWorkspaces(a)
109+
workspaces.create_account_level_groups(prompts, workspace_ids)
110+
111+
93112
@ucx.command
94113
def manual_workspace_info(w: WorkspaceClient):
95114
"""only supposed to be run if cannot get admins to run `databricks labs ucx sync-workspace-info`"""

tests/integration/test_account.py

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
from databricks.labs.blueprint.tui import MockPrompts
2+
from databricks.sdk import AccountClient
3+
from databricks.sdk.errors import NotFound
4+
5+
from databricks.labs.ucx.account import AccountWorkspaces
6+
7+
8+
def test_create_account_level_groups(make_ucx_group, make_group, make_user, acc, ws, make_random):
9+
suffix = make_random()
10+
make_ucx_group(f"test_ucx_migrate_invalid_{suffix}", f"test_ucx_migrate_invalid_{suffix}")
11+
12+
make_group(display_name=f"regular_group_{suffix}", members=[make_user().id])
13+
AccountWorkspaces(acc).create_account_level_groups(MockPrompts({}), [ws.get_workspace_id()])
14+
15+
results = []
16+
for grp in acc.groups.list():
17+
if grp.display_name in {f"regular_group_{suffix}"}:
18+
results.append(grp)
19+
try_delete_group(acc, grp.id) # Avoids flakiness for future runs
20+
21+
assert len(results) == 1
22+
23+
24+
def try_delete_group(acc: AccountClient, grp_id: str):
25+
try:
26+
acc.groups.delete(grp_id)
27+
except NotFound:
28+
pass

0 commit comments

Comments
 (0)