|
| 1 | +# Migration progress |
| 2 | + |
| 3 | +UCX tracks migration progress of _business_ resources: workspace objects that contribute to business value. |
| 4 | +(The term "business resource" comes from the UCX team and is **not** Databricks terminology.) We identified the |
| 5 | +following business resources: |
| 6 | + |
| 7 | +| Business resource | Motivation | |
| 8 | +|----------------------------|----------------------------------------------------------------------------------------------------------------------------| |
| 9 | +| Dashboard | Dashboards visualize data models supporting business processes | |
| 10 | +| Job | Jobs create data models supporting business process - not exclusively data models used by dashboards | |
| 11 | +| Delta Live Table pipelines | Delta Live Table pipelines create data models supporting business process - not exclusively data models used by dashboards | |
| 12 | + |
| 13 | +Furthermore, UCX tracks migration of the following Hive and workspace objects: |
| 14 | + |
| 15 | +| Hive or workspace object | |
| 16 | +|------------------------------------| |
| 17 | +| Tables and view (Hive data object) | |
| 18 | +| Grant | |
| 19 | +| User defined function (UDF) | |
| 20 | +| Cluster | |
| 21 | +| Cluster policies | |
| 22 | + |
| 23 | +See the [resource index](#resource-index) for more details on the above objects. |
| 24 | + |
| 25 | +## Usage |
| 26 | + |
| 27 | +Use the migration progress through the [migration progress dashboard](../README.md#dashboards) after running the |
| 28 | +[(experimental) migration progress workflow](../README.md#experimental-migration-progress-workflow). |
| 29 | + |
| 30 | +### Failures |
| 31 | + |
| 32 | +A [key historical attribute](#historical) in migration progress are the `failures` that show the incompatibility issues |
| 33 | +with Unity Catalog. By resolving the failures for an object, UCX flags that object to be Unity Compatible. Thus, |
| 34 | +for Hive data objects, this means that the objects are migrated to Unity Catalog. |
| 35 | + |
| 36 | +### Owner |
| 37 | + |
| 38 | +Another [key historical attribute](#historical) in migration progress is the `owner` that shows who owns the object, |
| 39 | +thus who is key for making the object Unity Catalog compatible. The ownership is a best effort basis; a concept made |
| 40 | +more central in [Unity Catalog](https://docs.databricks.com/en/data-governance/unity-catalog/manage-privileges/ownership.html). |
| 41 | + |
| 42 | +## Tracking |
| 43 | + |
| 44 | +The [(experimental) migration progress workflow](../README.md#experimental-migration-progress-workflow) tracks the |
| 45 | +migration progress and populates [migration progress tables](#persistence). |
| 46 | + |
| 47 | +### Roll-up to business resources |
| 48 | + |
| 49 | +The migration process' main intent is to track if business resources are migrated to Unity Catalog. UCX rolls up the |
| 50 | +failures of dependent resources to the business resources so that the business resources show |
| 51 | +the [`failures`](#failures) of the dependent resources. |
| 52 | + |
| 53 | +| Business resource | Dependent resources | |
| 54 | +|----------------------------|-------------------------------------------------------------------| |
| 55 | +| Dashboard | Queries | |
| 56 | +| Job | Cluster, cluster policies, cluster configurations, code resources | |
| 57 | +| Delta Live Table pipelines | TBD | |
| 58 | + |
| 59 | +Similarly, a roll-up for the failures of the Hive and workspace object are done: |
| 60 | + |
| 61 | +| Hive or workspace object | Dependent resources | |
| 62 | +|------------------------------------|------------------------------| |
| 63 | +| Tables and view (Hive data object) | Grants, TableMigrationStatus | |
| 64 | +| Grant | | |
| 65 | +| User defined function (UDF) | | |
| 66 | +| Cluster | Cluster policies | |
| 67 | +| Cluster policies | | |
| 68 | + |
| 69 | +### Dangling Hive or workspace objects |
| 70 | + |
| 71 | +Hive or workspace objects that are not referenced by [business resources](#roll-up-to-business-resources) are considered |
| 72 | +to be _dangling_ objects. For now, these objects are tracked separately, thus not rolled up to business resources. |
| 73 | + |
| 74 | +## Persistence |
| 75 | + |
| 76 | +The progress is persisted in the [UCX UC catalog](../README.md#create-ucx-catalog-command) so that migration progress can be |
| 77 | +tracked cross-workspace. The catalog contains the tables below. |
| 78 | + |
| 79 | +### Historical |
| 80 | + |
| 81 | +The [`historical` table](../src/databricks/labs/ucx/progress/install.py) contains historical records of inventory |
| 82 | +objects relevant to the migration progress |
| 83 | + |
| 84 | +| Column | Data type | Description | |
| 85 | +|--------------|-------------------------|------------------------------------------------------------------------------------| |
| 86 | +| workspace_id | integer | The identifier of the workspace where this historical record was generated. | |
| 87 | +| job_run_id | integer | The identifier of the job run that generated this historical record. | |
| 88 | +| object_type | string | The inventory table for which this historical record was generated. | |
| 89 | +| object_id | list[string] | The type-specific identifier for the corresponding inventory record. | |
| 90 | +| data | mapping[string, string] | Type-specific JSON-encoded data of the inventory record. | |
| 91 | +| failures | list[string] | The list of problems associated with the object that this inventory record covers. | |
| 92 | +| owner | string | The identity that has ownership of the object. | |
| 93 | +| ucx_version | string | The UCX semantic version. | |
| 94 | + |
| 95 | +Example historical record: |
| 96 | + |
| 97 | +| workspace_id | job_run_id | object_type | object_id | data | failures | owner | ucx_version | |
| 98 | +|--------------|------------|-------------|---------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------|-------------------------------|-------------| |
| 99 | +| 123456789 | 1 | 'Table' | ['hive_metastore', 'schema', 'table' ] | {'database': 'schem', 'name': 'table', 'catalog': 'hive_metastore', 'object_type': 'MANAGED', 'table_format': 'DELTA', 'is_partitioned': 'false'} | ['Used by NOTEBOOK: test/test.py' ] | ' [email protected]' | '0.50.0' | |
| 100 | + |
| 101 | +### Workflow run |
| 102 | + |
| 103 | +The auxiliary [`workflow_runs` table](../src/databricks/labs/ucx/progress/install.py) tracks UCX workflow runs. |
| 104 | + |
| 105 | +| Column | Data type | Description | |
| 106 | +|----------------------|-------------|--------------------------------------------| |
| 107 | +| started_at | dt.datetime | The timestamp of the workflow run start | |
| 108 | +| finished_at | dt.datetime | The timestamp of the workflow run end | |
| 109 | +| workspace_id | int | The workspace id in which the workflow ran | |
| 110 | +| workflow_name | str | The workflow name that ran | |
| 111 | +| workflow_id | int | The workflow id of the workflow that ran | |
| 112 | +| workflow_run_id | int | The workflow run id | |
| 113 | +| workflow_run_attempt | int | The workflow run attempt | |
| 114 | + |
| 115 | +Example workflow run record: |
| 116 | + |
| 117 | +| started_at | finished_at | workspace_id | workflow_name | workflow_id | workflow_run_id | workflow_run_attempt | |
| 118 | +|---------------------------------------------------------------------------|---------------------------------------------------------------------------|--------------|-------------------------------------|-------------|-----------------|----------------------| |
| 119 | +| datetime.datetime(2024, 11, 22, 13, 50, 37, tzinfo=datetime.timezone.utc) | datetime.datetime(2024, 11, 22, 13, 50, 58, tzinfo=datetime.timezone.utc) | 123456789 | 'Migration progress (experimental)' | 123 | 456 | 0 | |
| 120 | + |
| 121 | +## Resource index |
| 122 | + |
| 123 | +| Hive or workspace object | Description | Dependent resources | |
| 124 | +|------------------------------------|----------------------------------------------------------------------------------------------------------------------------|-----------------------------------------| |
| 125 | +| Redash dashboard | The Redash dashboard | Queries | |
| 126 | +| Lakeview dashboard | The Lakeview dashboard | Queries | |
| 127 | +| Dashboard | The Redash or lakeview dashboard | Queries | |
| 128 | +| Job | Jobs create data models supporting business process - not exclusively data models used by dashboards | Tasks, Cluster | |
| 129 | +| Job task | Job tasks, defined as part of the job definition | Code | |
| 130 | +| Delta Live Table pipelines | Delta Live Table pipelines create data models supporting business process - not exclusively data models used by dashboards | | |
| 131 | +| Tables and view (Hive data object) | Hive data objects | Grant, Table migration stats | |
| 132 | +| Grant | Data object privileges | Legacy grant, Interactive cluster grant | |
| 133 | +| Legacy grant | Legacy Hive grant privileges managed through `GRANT`, `REVOKE` and `DENY` SQL statements | | |
| 134 | +| Interactive cluster grant | Data object privileges inferred through interactive cluster data access | | |
| 135 | +| User defined function (UDF) | Hive user defined functions | UDF code definition | |
| 136 | +| Cluster | The job cluster, either job or interactive cluster | | |
| 137 | +| Cluster policies | The cluster policies | | |
| 138 | +| Table migration status | Status of a table or view being migrated to Unity Catalog, or not | | |
| 139 | +| Code | Code definitions either Python or SQL | | |
0 commit comments