Skip to content

Commit 60718a7

Browse files
authored
Add migration progress documentation (#3333)
## Changes Add migration progress design documentation ### Linked issues Progresses #2074 Related #3067 ### Functionality - [x] added relevant user documentation: `docs/migration-progress.md`
1 parent dc0233e commit 60718a7

File tree

2 files changed

+147
-8
lines changed

2 files changed

+147
-8
lines changed

README.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -915,14 +915,14 @@ The output is processed and displayed in the migration dashboard using the in `r
915915

916916
## [EXPERIMENTAL] Migration Progress Workflow
917917

918-
The `migration-progress-experimental` workflow updates a subset of the inventory tables to track migration status of
919-
workspace resources that need to be migrated. Besides updating the inventory tables, this workflow tracks the migration
920-
progress by updating the following [UCX catalog](#create-ucx-catalog-command) tables:
918+
The manually triggered `migration-progress-experimental` workflow populates the tables visualized in the
919+
[migration progress dashboard](#dashboards) by updating a **subset** of the [inventory tables](#assessment-workflow)
920+
to [track Unity Catalog compatability](docs/migration-progress.md) of Hive and workspace objects that need to be migrated.
921921

922-
- `workflow_runs`: Tracks the status of the workflow runs.
923-
924-
_Note: A subset of the inventory is updated, *not* the complete inventory that is initially gathered by
925-
the [assessment workflow](#assessment-workflow)._
922+
The following pre-requisites need to be fulfilled before running the workflow:
923+
- [UC metastore attached to workspace](../README.md#assign-metastore-command)
924+
- [UCX catalog exists](../README.md#create-ucx-catalog-command)
925+
- [Assessment job ran successfully](../README.md#ensure-assessment-run-command)
926926

927927
[[back to top](#databricks-labs-ucx)]
928928

@@ -939,7 +939,7 @@ overview with a short description is given.
939939
| [Assessment \[Azure\]](./src/databricks/labs/ucx/queries/assessment/azure/00_0_azure_service_principals.md) | Assessment outcomes specific to Azure |
940940
| [Migration \[Main\]](./src/databricks/labs/ucx/queries/migration/main/00_0_migration_overview.md) | Migration overview |
941941
| [Migration \[Groups\]](./src/databricks/labs/ucx/queries/migration/groups/00_0_migration_overview.md) | Group migration outcomes |
942-
| [Progress \[Main\]](./src/databricks/labs/ucx/queries/progress/main/00_0_migration_progress.md) | Migration progress |
942+
| [Progress \[Main\]](./src/databricks/labs/ucx/queries/progress/main/00_0_migration_progress.md) | [Migration progress](./docs/migration-progress.md) |
943943

944944
[[back to top](#databricks-labs-ucx)]
945945

docs/migration-progress.md

Lines changed: 139 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,139 @@
1+
# Migration progress
2+
3+
UCX tracks migration progress of _business_ resources: workspace objects that contribute to business value.
4+
(The term "business resource" comes from the UCX team and is **not** Databricks terminology.) We identified the
5+
following business resources:
6+
7+
| Business resource | Motivation |
8+
|----------------------------|----------------------------------------------------------------------------------------------------------------------------|
9+
| Dashboard | Dashboards visualize data models supporting business processes |
10+
| Job | Jobs create data models supporting business process - not exclusively data models used by dashboards |
11+
| Delta Live Table pipelines | Delta Live Table pipelines create data models supporting business process - not exclusively data models used by dashboards |
12+
13+
Furthermore, UCX tracks migration of the following Hive and workspace objects:
14+
15+
| Hive or workspace object |
16+
|------------------------------------|
17+
| Tables and view (Hive data object) |
18+
| Grant |
19+
| User defined function (UDF) |
20+
| Cluster |
21+
| Cluster policies |
22+
23+
See the [resource index](#resource-index) for more details on the above objects.
24+
25+
## Usage
26+
27+
Use the migration progress through the [migration progress dashboard](../README.md#dashboards) after running the
28+
[(experimental) migration progress workflow](../README.md#experimental-migration-progress-workflow).
29+
30+
### Failures
31+
32+
A [key historical attribute](#historical) in migration progress are the `failures` that show the incompatibility issues
33+
with Unity Catalog. By resolving the failures for an object, UCX flags that object to be Unity Compatible. Thus,
34+
for Hive data objects, this means that the objects are migrated to Unity Catalog.
35+
36+
### Owner
37+
38+
Another [key historical attribute](#historical) in migration progress is the `owner` that shows who owns the object,
39+
thus who is key for making the object Unity Catalog compatible. The ownership is a best effort basis; a concept made
40+
more central in [Unity Catalog](https://docs.databricks.com/en/data-governance/unity-catalog/manage-privileges/ownership.html).
41+
42+
## Tracking
43+
44+
The [(experimental) migration progress workflow](../README.md#experimental-migration-progress-workflow) tracks the
45+
migration progress and populates [migration progress tables](#persistence).
46+
47+
### Roll-up to business resources
48+
49+
The migration process' main intent is to track if business resources are migrated to Unity Catalog. UCX rolls up the
50+
failures of dependent resources to the business resources so that the business resources show
51+
the [`failures`](#failures) of the dependent resources.
52+
53+
| Business resource | Dependent resources |
54+
|----------------------------|-------------------------------------------------------------------|
55+
| Dashboard | Queries |
56+
| Job | Cluster, cluster policies, cluster configurations, code resources |
57+
| Delta Live Table pipelines | TBD |
58+
59+
Similarly, a roll-up for the failures of the Hive and workspace object are done:
60+
61+
| Hive or workspace object | Dependent resources |
62+
|------------------------------------|------------------------------|
63+
| Tables and view (Hive data object) | Grants, TableMigrationStatus |
64+
| Grant | |
65+
| User defined function (UDF) | |
66+
| Cluster | Cluster policies |
67+
| Cluster policies | |
68+
69+
### Dangling Hive or workspace objects
70+
71+
Hive or workspace objects that are not referenced by [business resources](#roll-up-to-business-resources) are considered
72+
to be _dangling_ objects. For now, these objects are tracked separately, thus not rolled up to business resources.
73+
74+
## Persistence
75+
76+
The progress is persisted in the [UCX UC catalog](../README.md#create-ucx-catalog-command) so that migration progress can be
77+
tracked cross-workspace. The catalog contains the tables below.
78+
79+
### Historical
80+
81+
The [`historical` table](../src/databricks/labs/ucx/progress/install.py) contains historical records of inventory
82+
objects relevant to the migration progress
83+
84+
| Column | Data type | Description |
85+
|--------------|-------------------------|------------------------------------------------------------------------------------|
86+
| workspace_id | integer | The identifier of the workspace where this historical record was generated. |
87+
| job_run_id | integer | The identifier of the job run that generated this historical record. |
88+
| object_type | string | The inventory table for which this historical record was generated. |
89+
| object_id | list[string] | The type-specific identifier for the corresponding inventory record. |
90+
| data | mapping[string, string] | Type-specific JSON-encoded data of the inventory record. |
91+
| failures | list[string] | The list of problems associated with the object that this inventory record covers. |
92+
| owner | string | The identity that has ownership of the object. |
93+
| ucx_version | string | The UCX semantic version. |
94+
95+
Example historical record:
96+
97+
| workspace_id | job_run_id | object_type | object_id | data | failures | owner | ucx_version |
98+
|--------------|------------|-------------|---------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------|-------------------------------|-------------|
99+
| 123456789 | 1 | 'Table' | ['hive_metastore', 'schema', 'table'] | {'database': 'schem', 'name': 'table', 'catalog': 'hive_metastore', 'object_type': 'MANAGED', 'table_format': 'DELTA', 'is_partitioned': 'false'} | ['Used by NOTEBOOK: test/test.py'] | '[email protected]' | '0.50.0' |
100+
101+
### Workflow run
102+
103+
The auxiliary [`workflow_runs` table](../src/databricks/labs/ucx/progress/install.py) tracks UCX workflow runs.
104+
105+
| Column | Data type | Description |
106+
|----------------------|-------------|--------------------------------------------|
107+
| started_at | dt.datetime | The timestamp of the workflow run start |
108+
| finished_at | dt.datetime | The timestamp of the workflow run end |
109+
| workspace_id | int | The workspace id in which the workflow ran |
110+
| workflow_name | str | The workflow name that ran |
111+
| workflow_id | int | The workflow id of the workflow that ran |
112+
| workflow_run_id | int | The workflow run id |
113+
| workflow_run_attempt | int | The workflow run attempt |
114+
115+
Example workflow run record:
116+
117+
| started_at | finished_at | workspace_id | workflow_name | workflow_id | workflow_run_id | workflow_run_attempt |
118+
|---------------------------------------------------------------------------|---------------------------------------------------------------------------|--------------|-------------------------------------|-------------|-----------------|----------------------|
119+
| datetime.datetime(2024, 11, 22, 13, 50, 37, tzinfo=datetime.timezone.utc) | datetime.datetime(2024, 11, 22, 13, 50, 58, tzinfo=datetime.timezone.utc) | 123456789 | 'Migration progress (experimental)' | 123 | 456 | 0 |
120+
121+
## Resource index
122+
123+
| Hive or workspace object | Description | Dependent resources |
124+
|------------------------------------|----------------------------------------------------------------------------------------------------------------------------|-----------------------------------------|
125+
| Redash dashboard | The Redash dashboard | Queries |
126+
| Lakeview dashboard | The Lakeview dashboard | Queries |
127+
| Dashboard | The Redash or lakeview dashboard | Queries |
128+
| Job | Jobs create data models supporting business process - not exclusively data models used by dashboards | Tasks, Cluster |
129+
| Job task | Job tasks, defined as part of the job definition | Code |
130+
| Delta Live Table pipelines | Delta Live Table pipelines create data models supporting business process - not exclusively data models used by dashboards | |
131+
| Tables and view (Hive data object) | Hive data objects | Grant, Table migration stats |
132+
| Grant | Data object privileges | Legacy grant, Interactive cluster grant |
133+
| Legacy grant | Legacy Hive grant privileges managed through `GRANT`, `REVOKE` and `DENY` SQL statements | |
134+
| Interactive cluster grant | Data object privileges inferred through interactive cluster data access | |
135+
| User defined function (UDF) | Hive user defined functions | UDF code definition |
136+
| Cluster | The job cluster, either job or interactive cluster | |
137+
| Cluster policies | The cluster policies | |
138+
| Table migration status | Status of a table or view being migrated to Unity Catalog, or not | |
139+
| Code | Code definitions either Python or SQL | |

0 commit comments

Comments
 (0)