Skip to content

Commit 643bac6

Browse files
authored
Group manager optimisation: during group enumeration only request the attributes that are needed. (#2240)
## Changes Group enumeration is expensive: we already have a special code-path when members are requested while enumerating groups due to API issues. This PR improves that code-path such that during enumeration only the bare minimum set of attributes is requested. ### Functionality - modified existing workflow: `group-migration`
1 parent e30d1ad commit 643bac6

File tree

1 file changed

+6
-5
lines changed
  • src/databricks/labs/ucx/workspace_access

1 file changed

+6
-5
lines changed

src/databricks/labs/ucx/workspace_access/groups.py

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -644,16 +644,17 @@ def _is_group_out_of_scope(self, group: iam.Group, resource_type: str) -> bool:
644644

645645
def _list_workspace_groups(self, resource_type: str, scim_attributes: str) -> list[iam.Group]:
646646
results = []
647-
logger.info(f"Listing workspace groups (resource_type={resource_type}) with {scim_attributes}...")
648-
# these attributes can get too large causing the api to timeout
649-
# so we're fetching groups without these attributes first
650-
# and then calling get on each of them to fetch all attributes
647+
logger.info(f"Listing workspace groups (resource_type={resource_type}) with {scim_attributes} ...")
648+
# If members are requested during enumeration the API can time out. In this case we fall back on
649+
# a strategy of enumerating the bare minimum and request full attributes for each group individually.
651650
attributes = scim_attributes.split(",")
652651
if "members" in attributes:
653652
attributes.remove("members")
654653
retry_on_internal_error = retried(on=[InternalError], timeout=self._verify_timeout)
655654
get_group = retry_on_internal_error(self._get_group)
656-
for group in self._ws.groups.list(attributes=",".join(attributes)):
655+
# Limit to the attributes we need for determining if the group is out of scope; the rest are fetched later.
656+
scan_attributes = [attribute for attribute in attributes if attribute in {"id", "displayName", "meta"}]
657+
for group in self._ws.groups.list(attributes=",".join(scan_attributes)):
657658
if self._is_group_out_of_scope(group, resource_type):
658659
continue
659660
group_with_all_attributes = get_group(group.id)

0 commit comments

Comments
 (0)