Skip to content

Commit 5c39c98

Browse files
renardeinsidenfx
andauthored
Crawl dashboards, queries, and alerts (#144)
This PR introduces support for DBSQL object permissions - both inventory and permission application. SQL: - [ ] Dashboard - [ ] Queries - [ ] Alerts --------- Co-authored-by: Serge Smertin <[email protected]>
1 parent ac8855f commit 5c39c98

39 files changed

+1669
-1946
lines changed

.gitignore

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -145,7 +145,7 @@ cython_debug/
145145
# dev files and scratches
146146
dev/cleanup.py
147147

148-
Support
149-
150148
.databricks
151-
.vscode
149+
.vscode
150+
151+
.python-version

Makefile

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,4 +5,7 @@ fmt:
55
hatch run lint:fmt
66

77
test:
8-
hatch run unit:test
8+
hatch run unit:test
9+
10+
test-cov:
11+
hatch run unit:test-cov-report

docs/logic.md

Lines changed: 184 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,184 @@
1+
# Permissions migration logic and data structures
2+
3+
On a very high-level, the permissions inventorization process is split into two steps:
4+
5+
1. collect all existing permissions into a persistent storage.
6+
2. apply the collected permissions to the target resources.
7+
8+
The first step is performed by the `Crawler` and the second by the `Applier`.
9+
10+
Crawler and applier are intrinsically connected to each other due to SerDe (serialization/deserialization) logic.
11+
12+
We implement separate crawlers and applier for each supported resource type.
13+
14+
Please note that `table ACLs` logic is currently handled separately from the logic described in this document.
15+
16+
## Logical objects and relevant APIs
17+
18+
19+
### Group level properties (uses SCIM API)
20+
21+
- [x] Entitlements (One of `workspace-access`, `databricks-sql-access`, `allow-cluster-create`, `allow-instance-pool-create`)
22+
- [x] Roles (AWS only)
23+
24+
These are workspace-level properties that are not associated with any specific resource.
25+
26+
Additional info:
27+
28+
- object ID: `group_id`
29+
- listing method: `ws.groups.list`
30+
- get method: `ws.groups.get(group_id)`
31+
- put method: `ws.groups.patch(group_id)`
32+
33+
### Compute infrastructure (uses Permissions API)
34+
35+
- [x] Clusters
36+
- [x] Cluster policies
37+
- [x] Instance pools
38+
- [x] SQL warehouses
39+
40+
These are compute infrastructure resources that are associated with a specific workspace.
41+
42+
Additional info:
43+
44+
- object ID: `cluster_id`, `policy_id`, `instance_pool_id`, `id` (SQL warehouses)
45+
- listing method: `ws.clusters.list`, `ws.cluster_policies.list`, `ws.instance_pools.list`, `ws.warehouses.list`
46+
- get method: `ws.permissions.get(object_id, object_type)`
47+
- put method: `ws.permissions.update(object_id, object_type)`
48+
- get response object type: `databricks.sdk.service.iam.ObjectPermissions`
49+
50+
51+
### Workflows (uses Permissions API)
52+
53+
- [x] Jobs
54+
- [x] Delta Live Tables
55+
56+
These are workflow resources that are associated with a specific workspace.
57+
58+
Additional info:
59+
60+
- object ID: `job_id`, `pipeline_id`
61+
- listing method: `ws.jobs.list`, `ws.pipelines.list`
62+
- get method: `ws.permissions.get(object_id, object_type)`
63+
- put method: `ws.permissions.update(object_id, object_type)`
64+
- get response object type: `databricks.sdk.service.iam.ObjectPermissions`
65+
66+
### ML (uses Permissions API)
67+
68+
- [x] MLflow experiments
69+
- [x] MLflow models
70+
71+
These are ML resources that are associated with a specific workspace.
72+
73+
Additional info:
74+
75+
- object ID: `experiment_id`, `id` (models)
76+
- listing method: custom listing
77+
- get method: `ws.permissions.get(object_id, object_type)`
78+
- put method: `ws.permissions.update(object_id, object_type)`
79+
- get response object type: `databricks.sdk.service.iam.ObjectPermissions`
80+
81+
82+
### SQL (uses SQL Permissions API)
83+
84+
- [x] Alerts
85+
- [x] Dashboards
86+
- [x] Queries
87+
88+
These are SQL resources that are associated with a specific workspace.
89+
90+
Additional info:
91+
92+
- object ID: `id`
93+
- listing method: `ws.alerts.list`, `ws.dashboards.list`, `ws.queries.list`
94+
- get method: `ws.dbsql_permissions.get`
95+
- put method: `ws.dbsql_permissions.set`
96+
- get response object type: `databricks.sdk.service.sql.GetResponse`
97+
- Note that API has no support for UPDATE operation, only PUT (overwrite) is supported.
98+
99+
### Security (uses Permissions API)
100+
101+
- [x] Tokens
102+
- [x] Passwords
103+
104+
These are security resources that are associated with a specific workspace.
105+
106+
Additional info:
107+
108+
- object ID: `tokens` (static value), `passwords` (static value)
109+
- listing method: N/A
110+
- get method: `ws.permissions.get(object_id, object_type)`
111+
- put method: `ws.permissions.update(object_id, object_type)`
112+
- get response object type: `databricks.sdk.service.iam.ObjectPermissions`
113+
114+
### Workspace (uses Permissions API)
115+
116+
- [x] Notebooks
117+
- [x] Directories
118+
- [x] Repos
119+
- [x] Files
120+
121+
These are workspace resources that are associated with a specific workspace.
122+
123+
Additional info:
124+
125+
- object ID: `object_id`
126+
- listing method: custom listing
127+
- get method: `ws.permissions.get(object_id, object_type)`
128+
- put method: `ws.permissions.update(object_id, object_type)`
129+
- get response object type: `databricks.sdk.service.iam.ObjectPermissions`
130+
131+
### Secrets (uses Secrets API)
132+
133+
- [x] Secrets
134+
135+
These are secrets resources that are associated with a specific workspace.
136+
137+
Additional info:
138+
139+
- object ID: `scope_name`
140+
- listing method: `ws.secrets.list_scopes()`
141+
- get method: `ws.secrets.list_acls(scope_name)`
142+
- put method: `ws.secrets.put_acl`
143+
144+
145+
## Crawler and serialization logic
146+
147+
Crawlers are expected to return a list of callable functions that will be later used to get the permissions.
148+
Each of these functions shall return a `PermissionInventoryItem` that should be serializable into a Delta Table.
149+
The permission payload differs between different crawlers, therefore each crawler should implement a serialization
150+
method.
151+
152+
## Applier and deserialization logic
153+
154+
Appliers are expected to accept a list of `PermissionInventoryItem` and generate a list of callables that will apply the
155+
given permissions.
156+
Each applier should implement a deserialization method that will convert the raw payload into a typed one.
157+
Each permission item should have a crawler type associated with it, so that the applier can use the correct
158+
deserialization method.
159+
160+
## Relevance identification
161+
162+
Since we save all objects into the permission table, we need to filter out the objects that are not relevant to the
163+
current migration.
164+
We do this inside the `applier`, by returning a `noop` callable if the object is not relevant to the current migration.
165+
166+
## Crawling the permissions
167+
168+
To crawl the permissions, we use the following logic:
169+
1. Go through the list of all crawlers.
170+
2. Get the list of all objects of the given type.
171+
3. For each object, generate a callable that will return a `PermissionInventoryItem`.
172+
4. Execute the callables in parallel
173+
5. Collect the results into a list of `PermissionInventoryItem`.
174+
6. Save the list of `PermissionInventoryItem` into a Delta Table.
175+
176+
## Applying the permissions
177+
178+
To apply the permissions, we use the following logic:
179+
180+
1. Read the Delta Table with raw permissions.
181+
2. Map the items to the relevant `support` object. If no relevant `support` object is found, an exception is raised.
182+
3. Deserialize the items using the relevant applier.
183+
4. Generate a list of callables that will apply the permissions.
184+
5. Execute the callables in parallel.

pyproject.toml

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -173,8 +173,12 @@ known-first-party = ["databricks.labs.ucx"]
173173
ban-relative-imports = "all"
174174

175175
[tool.ruff.per-file-ignores]
176-
# Tests can use magic values, assertions, and relative imports
177-
"tests/**/*" = ["PLR2004", "S101", "TID252"]
176+
177+
"tests/**/*" = [
178+
"PLR2004", "S101", "TID252", # tests can use magic values, assertions, and relative imports
179+
"ARG001" # tests may not use the provided fixtures
180+
]
181+
178182
"src/databricks/labs/ucx/providers/mixins/redash.py" = ["A002", "A003", "N815"]
179183

180184
[tool.coverage.run]

0 commit comments

Comments
 (0)