Skip to content

Commit b1173b7

Browse files
authored
Added CLI Command databricks labs ucx principal-prefix-access (#949)
## Changes This PR merges the three cli command save-azure-storage-accounts, save-aws-iam-profiles and save-uc-compatible-roles into one common cmd save-storage-and-principal - updating labs.yml and cli file to replace existing cmd with new cmd - updated cli_test cases to reflect the change in code ### Closes #909 ### Functionality - [ ] added relevant user documentation - [X] added new CLI command - [X] modified existing command: `databricks labs ucx ...` - [ ] added a new workflow - [ ] modified existing workflow: `...` - [ ] added a new table - [ ] modified existing table: `...` ### Tests - [X] manually tested - [X] added unit tests - [ ] added integration tests - [ ] verified on staging environment (screenshot attached)
1 parent 2a6b090 commit b1173b7

File tree

5 files changed

+93
-141
lines changed

5 files changed

+93
-141
lines changed

README.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -150,18 +150,19 @@ After UCX assessment workflow is executed, the assessment dashboard will be popu
150150
### Scanning for legacy credentials and mapping access
151151
#### AWS
152152
Use to identify all instance profiles in the workspace, and map their access to S3 buckets.
153+
Also captures the IAM roles which has UC arn listed, and map their access to S3 buckets
153154
This requires `awscli` to be installed and configured.
154155

155156
```commandline
156-
databricks labs ucx save-aws-iam-profiles
157+
databricks labs ucx principal-prefix-access --aws-profile test-profile
157158
```
158159

159160
#### Azure
160161
Use to identify all storage account used by tables, identify the relevant Azure service principals and their permissions on each storage account.
161162
This requires `azure-cli` to be installed and configured.
162163

163164
```commandline
164-
databricks labs ucx save-azure-storage-accounts
165+
databricks labs ucx principal-prefix-access --subscription-id test-subscription-id
165166
```
166167

167168
### Producing table mapping

labs.yml

Lines changed: 7 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -97,31 +97,20 @@ commands:
9797
- name: to-schema
9898
description: target schema to migrate tables to
9999

100-
- name: save-azure-storage-accounts
101-
description: Identifies all storage account used by tables, identify spn and its permission on each storage accounts
100+
- name: principal-prefix-access
101+
description: For azure cloud, identifies all storage account used by tables in the workspace, identify spn and its
102+
permission on each storage accounts. For aws, identifies all the Instance Profiles configured in the workspace and
103+
its access to all the S3 buckets, along with AWS roles that are set with UC access and its access to S3 buckets.
104+
The output is stored in the workspace install folder.
102105
flags:
103106
- name: subscription-id
104107
description: Subscription to scan storage account in
108+
- name: aws-profile
109+
description: AWS Profile to use for authentication
105110

106111
- name: validate-groups-membership
107112
description: Validate the groups to see if the groups at account level and workspace level have different membership
108113
table_template: |-
109114
Workspace Group Name\tMembers Count\tAccount Group Name\tMembers Count
110115
{{range .}}{{.wf_group_name}}\t{{.wf_group_members_count}}\t{{.acc_group_name}}\t{{.acc_group_members_count}}
111116
{{end}}
112-
113-
- name: save-aws-iam-profiles
114-
description: |
115-
Identifies all Instance Profiles and map their access to S3 buckets.
116-
Requires a working setup of AWS CLI.
117-
flags:
118-
- name: aws-profile
119-
description: AWS Profile to use for authentication
120-
121-
- name: save-uc-compatible-roles
122-
description: |
123-
Scan all the AWS roles that are set for UC access and produce a mapping to the S3 resources.
124-
Requires a working setup of AWS CLI.
125-
flags:
126-
- name: aws-profile
127-
description: AWS Profile to use for authentication

src/databricks/labs/ucx/cli.py

Lines changed: 40 additions & 70 deletions
Original file line numberDiff line numberDiff line change
@@ -228,76 +228,46 @@ def alias(
228228

229229

230230
@ucx.command
231-
def save_azure_storage_accounts(w: WorkspaceClient, subscription_id: str):
232-
"""identifies all azure storage account used by external tables
233-
identifies all spn which has storage blob reader, blob contributor, blob owner access
234-
saves the data in ucx database."""
235-
if not w.config.is_azure:
236-
logger.error("Workspace is not on azure, please run this command on azure databricks workspaces.")
237-
return
238-
if w.config.auth_type != "azure_cli":
239-
logger.error("In order to obtain AAD token, Please run azure cli to authenticate.")
240-
return
241-
if subscription_id == "":
242-
logger.error("Please enter subscription id to scan storage account in.")
243-
return
244-
azure_resource_permissions = AzureResourcePermissions.for_cli(w)
245-
logger.info("Generating azure storage accounts and service principal permission info")
246-
azure_resource_permissions.save_spn_permissions()
247-
248-
249-
@ucx.command
250-
def save_aws_iam_profiles(w: WorkspaceClient, aws_profile: str | None = None):
251-
"""identifies all Instance Profiles and map their access to S3 buckets.
252-
Requires a working setup of AWS CLI.
253-
https://aws.amazon.com/cli/
254-
The command saves a CSV to the UCX installation folder with the mapping.
255-
256-
The user has to be authenticated with AWS and the have the permissions to browse the resources and iam services.
257-
More information can be found here:
258-
https://docs.aws.amazon.com/IAM/latest/UserGuide/access_permissions-required.html
259-
"""
260-
if not shutil.which("aws"):
261-
logger.error("Couldn't find AWS CLI in path.Please obtain and install the CLI from https://aws.amazon.com/cli/")
262-
return None
263-
if not aws_profile:
264-
aws_profile = os.getenv("AWS_DEFAULT_PROFILE")
265-
if not aws_profile:
266-
logger.error(
267-
"AWS Profile is not specified. Use the environment variable [AWS_DEFAULT_PROFILE] "
268-
"or use the '--aws-profile=[profile-name]' parameter."
269-
)
270-
return None
271-
aws_permissions = AWSResourcePermissions.for_cli(w, aws_profile)
272-
aws_permissions.save_instance_profile_permissions()
273-
return None
274-
275-
276-
@ucx.command
277-
def save_uc_compatible_roles(w: WorkspaceClient, *, aws_profile: str | None = None):
278-
"""extracts all the iam roles with trust relationships to the UC master role.
279-
Map these roles to the S3 buckets they have access to.
280-
Requires a working setup of AWS CLI.
281-
https://aws.amazon.com/cli/
282-
The command saves a CSV to the UCX installation folder with the mapping.
283-
284-
The user has to be authenticated with AWS and the have the permissions to browse the resources and iam services.
285-
More information can be found here:
286-
https://docs.aws.amazon.com/IAM/latest/UserGuide/access_permissions-required.html
287-
"""
288-
if not shutil.which("aws"):
289-
logger.error("Couldn't find AWS CLI in path.Please obtain and install the CLI from https://aws.amazon.com/cli/")
290-
return None
291-
if not aws_profile:
292-
aws_profile = os.getenv("AWS_DEFAULT_PROFILE")
293-
if not aws_profile:
294-
logger.error(
295-
"AWS Profile is not specified. Use the environment variable [AWS_DEFAULT_PROFILE] "
296-
"or use the '--aws-profile=[profile-name]' parameter."
297-
)
298-
return None
299-
aws_permissions = AWSResourcePermissions.for_cli(w, aws_profile)
300-
aws_permissions.save_uc_compatible_roles()
231+
def principal_prefix_access(w: WorkspaceClient, subscription_id: str | None = None, aws_profile: str | None = None):
232+
"""For azure cloud, identifies all storage account used by tables in the workspace, identify spn and its
233+
permission on each storage accounts. For aws, identifies all the Instance Profiles configured in the workspace and
234+
its access to all the S3 buckets, along with AWS roles that are set with UC access and its access to S3 buckets.
235+
The output is stored in the workspace install folder.
236+
Pass suscription_id for azure and aws_profile for aws."""
237+
print("testing")
238+
if w.config.is_azure:
239+
if w.config.auth_type != "azure-cli":
240+
logger.error("In order to obtain AAD token, Please run azure cli to authenticate.")
241+
return None
242+
if subscription_id == "":
243+
logger.error("Please enter subscription id to scan storage account in.")
244+
return None
245+
azure_resource_permissions = AzureResourcePermissions.for_cli(w)
246+
logger.info("Generating azure storage accounts and service principal permission info")
247+
path = azure_resource_permissions.save_spn_permissions()
248+
if path:
249+
logger.info(f"storage and spn info saved under {path}")
250+
elif w.config.is_aws:
251+
if not shutil.which("aws"):
252+
logger.error("Couldn't find AWS CLI in path. Please install the CLI from https://aws.amazon.com/cli/")
253+
return None
254+
if not aws_profile:
255+
aws_profile = os.getenv("AWS_DEFAULT_PROFILE")
256+
if not aws_profile:
257+
logger.error(
258+
"AWS Profile is not specified. Use the environment variable [AWS_DEFAULT_PROFILE] "
259+
"or use the '--aws-profile=[profile-name]' parameter."
260+
)
261+
return None
262+
logger.info("Generating instance profile and bucket permission info")
263+
aws_permissions = AWSResourcePermissions.for_cli(w, aws_profile)
264+
instance_role_path = aws_permissions.save_instance_profile_permissions()
265+
logger.info(f"Instance profile and bucket info saved {instance_role_path}")
266+
logger.info("Generating UC roles and bucket permission info")
267+
uc_role_path = aws_permissions.save_uc_compatible_roles()
268+
logger.info(f"UC roles and bucket info saved {uc_role_path}")
269+
else:
270+
logger.error("This cmd is only supported for azure and aws workspaces")
301271
return None
302272

303273

tests/integration/azure/test_access.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,3 +38,9 @@ def test_save_spn_permissions_local(ws, sql_backend, inventory_schema, make_rand
3838
az_res_perm = AzureResourcePermissions(installation, ws, AzureResources(ws, include_subscriptions=""), location)
3939
path = az_res_perm.save_spn_permissions()
4040
assert ws.workspace.get_status(path)
41+
42+
43+
def test_cli(ws):
44+
from databricks.labs.ucx.cli import save_storage_and_principal
45+
46+
save_storage_and_principal(ws)

tests/unit/test_cli.py

Lines changed: 37 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,9 @@
1717
manual_workspace_info,
1818
move,
1919
open_remote_config,
20+
principal_prefix_access,
2021
repair_run,
2122
revert_migrated_tables,
22-
save_aws_iam_profiles,
23-
save_azure_storage_accounts,
24-
save_uc_compatible_roles,
2523
skip,
2624
sync_workspace_info,
2725
validate_external_locations,
@@ -231,91 +229,79 @@ def test_alias(ws):
231229
ws.tables.list.assert_called_once()
232230

233231

234-
def test_save_azure_storage_accounts_not_azure(ws, caplog):
235-
ws.config.is_azure = False
236-
237-
save_azure_storage_accounts(ws, "")
238-
239-
assert 'Workspace is not on azure, please run this command on azure databricks workspaces.' in caplog.messages
240-
241-
242-
def test_save_azure_storage_accounts_no_azure_cli(ws, caplog):
232+
def test_save_storage_and_principal_azure_no_azure_cli(ws, caplog):
243233
ws.config.auth_type = "azure_clis"
244-
245-
save_azure_storage_accounts(ws, "")
234+
ws.config.is_azure = True
235+
principal_prefix_access(ws, "")
246236

247237
assert 'In order to obtain AAD token, Please run azure cli to authenticate.' in caplog.messages
248238

249239

250-
def test_save_azure_storage_accounts_no_subscription_id(ws, caplog):
251-
ws.config.auth_type = "azure_cli"
240+
def test_save_storage_and_principal_azure_no_subscription_id(ws, caplog):
241+
ws.config.auth_type = "azure-cli"
252242
ws.config.is_azure = True
253243

254-
save_azure_storage_accounts(ws, "")
244+
principal_prefix_access(ws, "")
255245

256246
assert "Please enter subscription id to scan storage account in." in caplog.messages
257247

258248

259-
def test_save_azure_storage_accounts(ws, caplog):
260-
ws.config.auth_type = "azure_cli"
249+
def test_save_storage_and_principal_azure(ws, caplog, mocker):
250+
ws.config.auth_type = "azure-cli"
261251
ws.config.is_azure = True
262-
save_azure_storage_accounts(ws, "test")
263-
264-
ws.statement_execution.execute_statement.assert_called()
252+
azure_resource = mocker.patch("databricks.labs.ucx.azure.access.AzureResourcePermissions.save_spn_permissions")
253+
principal_prefix_access(ws, "test")
254+
azure_resource.assert_called_once()
265255

266256

267257
def test_validate_groups_membership(ws):
268258
validate_groups_membership(ws)
269259
ws.groups.list.assert_called()
270260

271261

272-
def test_save_aws_iam_profiles_no_profile(ws, caplog, mocker):
262+
def test_save_storage_and_principal_aws_no_profile(ws, caplog, mocker):
273263
mocker.patch("shutil.which", return_value="/path/aws")
274-
save_aws_iam_profiles(ws)
264+
ws.config.is_azure = False
265+
ws.config.is_aws = True
266+
principal_prefix_access(ws)
275267
assert any({"AWS Profile is not specified." in message for message in caplog.messages})
276268

277269

278-
def test_save_aws_iam_profiles_no_connection(ws, mocker):
270+
def test_save_storage_and_principal_aws_no_connection(ws, mocker):
279271
mocker.patch("shutil.which", return_value="/path/aws")
280272
pop = create_autospec(subprocess.Popen)
281-
273+
ws.config.is_azure = False
274+
ws.config.is_aws = True
282275
pop.communicate.return_value = (bytes("message", "utf-8"), bytes("error", "utf-8"))
283276
pop.returncode = 127
284277
mocker.patch("subprocess.Popen.__init__", return_value=None)
285278
mocker.patch("subprocess.Popen.__enter__", return_value=pop)
286279
mocker.patch("subprocess.Popen.__exit__", return_value=None)
287280

288281
with pytest.raises(ResourceWarning, match="AWS CLI is not configured properly."):
289-
save_aws_iam_profiles(ws, aws_profile="profile")
282+
principal_prefix_access(ws, aws_profile="profile")
290283

291284

292-
def test_save_aws_iam_profiles_no_cli(ws, mocker, caplog):
285+
def test_save_storage_and_principal_aws_no_cli(ws, mocker, caplog):
293286
mocker.patch("shutil.which", return_value=None)
294-
save_aws_iam_profiles(ws, aws_profile="profile")
287+
ws.config.is_azure = False
288+
ws.config.is_aws = True
289+
principal_prefix_access(ws, aws_profile="profile")
295290
assert any({"Couldn't find AWS" in message for message in caplog.messages})
296291

297292

298-
def test_save_uc_roles_no_profile(ws, caplog, mocker):
299-
mocker.patch("shutil.which", return_value="/path/aws")
300-
save_uc_compatible_roles(ws)
301-
assert any({"AWS Profile is not specified." in message for message in caplog.messages})
302-
303-
304-
def test_save_uc_roles_no_connection(ws, mocker):
305-
mocker.patch("shutil.which", return_value="/path/aws")
306-
pop = create_autospec(subprocess.Popen)
307-
308-
pop.communicate.return_value = (bytes("message", "utf-8"), bytes("error", "utf-8"))
309-
pop.returncode = 127
310-
mocker.patch("subprocess.Popen.__init__", return_value=None)
311-
mocker.patch("subprocess.Popen.__enter__", return_value=pop)
312-
mocker.patch("subprocess.Popen.__exit__", return_value=None)
313-
314-
with pytest.raises(ResourceWarning, match="AWS CLI is not configured properly."):
315-
save_uc_compatible_roles(ws, aws_profile="profile")
293+
def test_save_storage_and_principal_aws(ws, mocker, caplog):
294+
mocker.patch("shutil.which", return_value=True)
295+
ws.config.is_azure = False
296+
ws.config.is_aws = True
297+
aws_resource = mocker.patch("databricks.labs.ucx.assessment.aws.AWSResourcePermissions.for_cli")
298+
principal_prefix_access(ws, aws_profile="profile")
299+
aws_resource.assert_called_once()
316300

317301

318-
def test_save_uc_roles_no_cli(ws, mocker, caplog):
319-
mocker.patch("shutil.which", return_value=None)
320-
save_uc_compatible_roles(ws, aws_profile="profile")
321-
assert any({"Couldn't find AWS" in message for message in caplog.messages})
302+
def test_save_storage_and_principal_gcp(ws, caplog):
303+
ws.config.is_azure = False
304+
ws.config.is_aws = False
305+
ws.config.is_gcp = True
306+
principal_prefix_access(ws)
307+
assert "This cmd is only supported for azure and aws workspaces" in caplog.messages

0 commit comments

Comments
 (0)