Skip to content

Commit 06b7f5a

Browse files
authored
Expanded end-user documentation with detailed descriptions for workflows and commands (#999)
The Databricks Labs UCX project has been updated with several new features to assist in upgrading to Unity Catalog. These include various workflows and command-line utilities, such as an assessment workflow that generates a detailed compatibility report for workspace entities and a group migration workflow to upgrade all Databricks workspace assets. Additionally, new utility commands have been added for managing cross-workspace installations, and users can now view deployed workflows' status and repair failed workflows. A new end-user documentation has also been introduced, featuring comprehensive descriptions of workflows, commands, and an assessment report image. The Assessment Report, generated from UCX tools, now includes a more detailed summary of the assessment findings, table counts, database summaries, and external locations. Improved documentation for external Hive Metastore integration and a new debugging notebook are also included in this release. Lastly, the workspace group migration feature has been expanded to handle potential conflicts when migrating multiple workspaces with locally scoped group names.
1 parent 906e187 commit 06b7f5a

File tree

10 files changed

+1054
-340
lines changed

10 files changed

+1054
-340
lines changed

README.md

Lines changed: 514 additions & 124 deletions
Large diffs are not rendered by default.

docs/assessment-report.png

244 KB
Loading

docs/assessment.md

Lines changed: 186 additions & 11 deletions
Large diffs are not rendered by default.

docs/debug-logs.png

475 KB
Loading

docs/debug-notebook.png

653 KB
Loading

docs/external_hms_glue.md

Lines changed: 24 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,27 @@
1-
# External HMS and Glue Integration
1+
External Hive Metastore Integration
2+
===
23

3-
### TL;DR
4+
<!-- TOC -->
5+
* [External Hive Metastore Integration](#external-hive-metastore-integration)
6+
* [Current External HMS Integration](#current-external-hms-integration)
7+
* [Manual Setup/Override](#manual-setupoverride)
8+
* [Challenges and Gotchas](#challenges-and-gotchas)
9+
<!-- TOC -->
410

511
The UCX toolkit by default relies on the internal workspace HMS as a source for tables and views.
6-
<br/>The UCX is set up to run and introspect a single HMS.
7-
<br/>The installer is looking for evidence of an external Metastore (Glue and Others)
8-
<br/>If we find an external metastore we allow the user to use this configuration for UCX.
12+
- is set up to run and introspect a single HMS.
13+
- The installer is looking for evidence of an external Metastore (Glue and Others)
14+
- If we find an external metastore we allow the user to use this configuration for UCX.
915

10-
### Current External HMS Integration
16+
# Current External HMS Integration
1117

12-
To integrate with an External Metastore we need to configure the job clusters we generate.
13-
<br/> The setup process follows the following steps
18+
To integrate with an External Metastore we need to configure the job clusters we generate. The setup process follows the following steps
1419

1520
- We are list the existing cluster policies and look for an evidence of External Metastore
1621
-- Spark config `spark.databricks.hive.metastore.glueCatalog.enabled=true`
1722
-- Spark config containing `spark.sql.hive.metastore`
18-
- If we find evidence of external metastore we prompt the user with the following message:<br/>
19-
_We have identified one or more cluster policies set up for an external metastore. <br/>
23+
- If we find evidence of external metastore we prompt the user with the following message:
24+
_We have identified one or more cluster policies set up for an external metastore.
2025
Would you like to set UCX to connect to the external metastore._
2126
- Selecting **Yes** will display a list of the matching policies and allow the user to select the proper one.
2227
- We copy the Instance Profile and the spark configuration parameters from the cluster policy and apply these to the job
@@ -25,7 +30,9 @@ To integrate with an External Metastore we need to configure the job clusters we
2530
Metastore, the Dashboard will fail.
2631
- DBSQL Warehouse settings are global to the workspace and cannot be set individually on a single warehouse.
2732

28-
### Manual Setup/Override
33+
[[back to top](#external-hive-metastore-integration)]
34+
35+
# Manual Setup/Override
2936

3037
If the workspace doesn't have a cluster policy that is set up for External Metastore, there are two options to set UCX
3138
with External Metastore:
@@ -52,10 +59,14 @@ with External Metastore:
5259
Clusters before running the workflows.
5360
- Set up the DBSQL warehouses for the External Metastore
5461

55-
### Challenges and Gotchas
62+
[[back to top](#external-hive-metastore-integration)]
63+
64+
# Challenges and Gotchas
5665

5766
- UCX is currently designed to run on a single workspace at a time.
5867
- If you run UCX on multiple workspace leveraging the same metastore, follow the following guidelines:
5968
-- Use a different inventory database name for each of the workspaces. Otherwise, they will override one another.
6069
-- Migrate the table once. Running table migration (when it will become available) from multiple workspaces is
61-
redundant.
70+
redundant.
71+
72+
[[back to top](#external-hive-metastore-integration)]

docs/group_name_conflict.md

Lines changed: 16 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,33 +1,32 @@
1-
# Group Name Conflict Resolution
1+
Group Name Conflict Resolution
2+
===
3+
4+
See [this document](local-group-migration.md) for workspace group migration.
25

3-
During the UC upgrade process we migrate all the local workspace group to account level group.
4-
The process is detailed here: [local-group-migration.md](local-group-migration.md)
5-
<br/>
66
When migrating multiple workspaces we can run into conflicts.
77
These conflicts occur when groups with the same name in different workspaces have different membership and different
88
use.
99

10-
## Suggested Workflow
11-
12-
During the installation process we pose the following question:
13-
<br/>
14-
"Do you need to rename the workspace groups to match the account groups' name?"
10+
During the installation process we pose the following question: `Do you need to rename the workspace groups to match the account groups' name?`
1511

1612
If the answer is "Yes" a follow-up question will be:
17-
<br/>
18-
"Choose How to rename the workspace groups:"
1913

20-
1. Apply a Prefix
21-
2. Apply a Suffix
22-
3. Use Regular Expression Substitution
23-
4. User Regular Expression to extract a value from the account and the workspace
24-
5. Map using External Group ID
14+
```text
15+
Choose how to map the workspace groups:
16+
[0] Match by Name
17+
[1] Apply a Prefix
18+
[2] Apply a Suffix
19+
[3] Match by External ID
20+
[4] Regex Substitution
21+
[5] Regex Matching
22+
Enter a number between 0 and 5:
23+
```
2524

2625
The user then input the Prefix/Suffix/Regular Expression.
2726
The installation process will validate the regular expression.
2827
The installation process will register the selection as regular expression in the configuration YAML file.
2928

30-
We introduce 3 more parameters to the configuration yaml and the group manager:
29+
We introduce 3 more parameters to the [configuration](../README.md#open-remote-config-command) and the group manager:
3130

3231
- workspace_group_regex
3332
- workspace_group_replace

0 commit comments

Comments
 (0)