Skip to content

Commit fd0b604

Browse files
authored
Extend documentation with additional steps to get started (#947)
1 parent bd8c412 commit fd0b604

File tree

7 files changed

+302
-137
lines changed

7 files changed

+302
-137
lines changed

README.md

Lines changed: 155 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# ![UCX by Dataricks Labs](docs/logo-no-background.png)
1+
![UCX by Databricks Labs](docs/logo-no-background.png)
22

33
Your best companion for upgrading to Unity Catalog. It helps you to upgrade all Databricks workspace assets:
44
Legacy Table ACLs, Entitlements, AWS instance profiles, Clusters, Cluster policies, Instance Pools, Databricks SQL warehouses, Delta Live Tables, Jobs, MLflow experiments, MLflow registry, SQL Dashboards & Queries, SQL Alerts, Token and Password usage permissions that are set on the workspace level, Secret scopes, Notebooks, Directories, Repos, Files.
@@ -7,6 +7,28 @@ Legacy Table ACLs, Entitlements, AWS instance profiles, Clusters, Cluster polici
77

88
See [contributing instructions](CONTRIBUTING.md) to help improve this project.
99

10+
<!-- TOC -->
11+
* [Introduction](#introduction)
12+
* [Installation](#installation)
13+
* [Prerequisites](#prerequisites)
14+
* [Install Databricks CLI on macOS](#install-databricks-cli-on-macos)
15+
* [Install Databricks CLI via curl on Windows](#install-databricks-cli-via-curl-on-windows)
16+
* [Download & Install](#download--install)
17+
* [Install UCX](#install-ucx)
18+
* [Upgrade UCX](#upgrade-ucx)
19+
* [Uninstall UCX](#uninstall-ucx)
20+
* [Using UCX](#using-ucx)
21+
* [Executing assessment job](#executing-assessment-job)
22+
* [Understanding assessment report](#understanding-assessment-report)
23+
* [Scanning for legacy credentials and mapping access](#scanning-for-legacy-credentials-and-mapping-access)
24+
* [AWS](#aws)
25+
* [Azure](#azure)
26+
* [Producing table mapping](#producing-table-mapping)
27+
* [Synchronising UCX configurations](#synchronising-ucx-configurations)
28+
* [Validating group membership](#validating-group-membership)
29+
* [Star History](#star-history)
30+
* [Project Support](#project-support)
31+
<!-- TOC -->
1032

1133
## Introduction
1234
UCX will guide you, the Databricks customer, through the process of upgrading your account, groups, workspaces, jobs etc. to Unity Catalog.
@@ -18,59 +40,167 @@ UCX leverages Databricks Lakehouse platform to upgrade itself. The upgrade proce
1840

1941
By running the installation you install the assessment job and several upgrade jobs. The assessment and upgrade jobs are outlined in the custom-generated README.py that is created by the installer.
2042

21-
The custom-generated `README.py`, `config.yaml`, and other assets are placed into your Databricks workspace home folder, into a subfolder named `.ucx`. See [interactive tutorial](https://app.getreprise.com/launch/zXPxBZX/).
43+
The custom-generated `README.py`, `config.yaml`, and other assets are placed into your Databricks workspace home folder, into a sub-folder named `.ucx`. See [interactive tutorial](https://app.getreprise.com/launch/zXPxBZX/).
2244

2345

2446
Once the custom Databricks jobs are installed, begin by triggering the assessment job. The assessment job can be found under your workflows or via the active link in the README.py. Once the assessment job is complete, you can review the results in the custom-generated Databricks dashboard (linked to by the custom README.py found in the workspace folder created for you).
2547

2648

27-
You will need an account, unity catalog, and workspace administrative authority to complete the upgrade process. To run the installer, you will need to setup `databricks-cli` and a credential, [following these instructions.](https://docs.databricks.com/en/dev-tools/cli/databricks-cli.html) Additionally, the interim metadata and config data being processed by UCX will be stored into a Hive Metastore database schema generated at install time.
49+
You will need an account, unity catalog, and workspace administrative authority to complete the upgrade process. To run the installer, you will need to set up `databricks-cli` and a credential, [following these instructions.](https://docs.databricks.com/en/dev-tools/cli/databricks-cli.html) Additionally, the interim metadata and config data being processed by UCX will be stored into a Hive Metastore database schema generated at install time.
2850

2951

30-
For questions, troubleshooting or bug fixes, please see your Databricks account team or submit an issue to the [Databricks UCX github repo](https://github.com/databrickslabs/ucx)
52+
For questions, troubleshooting or bug fixes, please see your Databricks account team or submit an issue to the [Databricks UCX GitHub repo](https://github.com/databrickslabs/ucx)
3153

3254
## Installation
3355
### Prerequisites
34-
1. Get trained on UC [[free instructor-led training 2x week](https://customer-academy.databricks.com/learn/course/1683/data-governance-with-unity-catalog?generated_by=302876&hash=4eab6668f83636ba44d109880002b293e8dda6dd)] [[full training schedule](https://files.training.databricks.com/static/ilt-sessions/half-day-workshops/index.html)]
35-
2. You will need a desktop computer, running Windows, MacOS, or Linux; This computer is used to install the UCX toolkit onto the Databricks workspace, the computer will also need:
36-
- Network access to your Databricks Workspace
37-
- Network access to the Internet to retrieve additional Python packages (e.g. PyYAML, databricks-sdk,...) and access github.com
38-
- Python 3.10 or later - [Windows instructions](https://www.python.org/downloads/)
39-
- Databricks CLI with a workspace [configuration profile](https://docs.databricks.com/en/dev-tools/auth.html#databricks-client-unified-authentication) for workspace - [instructions](https://docs.databricks.com/en/dev-tools/cli/install.html)
40-
- Your windows computer will need a shell environment (GitBash or ([WSL](https://learn.microsoft.com/en-us/windows/wsl/about))
56+
1. Get trained on UC [[free instructor-led training 2x week]](https://customer-academy.databricks.com/learn/course/1683/data-governance-with-unity-catalog?generated_by=302876&hash=4eab6668f83636ba44d109880002b293e8dda6dd) [[full training schedule]](https://files.training.databricks.com/static/ilt-sessions/half-day-workshops/index.html)
57+
2. You will need a desktop computer, running Windows, macOS, or Linux; This computer is used to install the UCX toolkit onto the Databricks workspace, the computer will also need:
58+
- Network access to your Databricks Workspace
59+
- Network access to the Internet to retrieve additional Python packages (e.g. PyYAML, databricks-sdk,...) and access https://github.com
60+
- Python 3.10 or later - [Windows instructions](https://www.python.org/downloads/)
61+
- Databricks CLI with a workspace [configuration profile](https://docs.databricks.com/en/dev-tools/auth.html#databricks-client-unified-authentication) for workspace - [instructions](https://docs.databricks.com/en/dev-tools/cli/install.html)
62+
- Your Windows computer will need a shell environment (GitBash or ([WSL](https://learn.microsoft.com/en-us/windows/wsl/about))
4163
3. Within the Databricks Workspace you will need:
42-
- Workspace administrator access permissions
43-
- The ability for the installer to upload Python Wheel files to DBFS and Workspace FileSystem
44-
- A PRO or Serverless SQL Warehouse
45-
- The Assessment workflow will create a legacy "No Isolation Shared" and a legacy "Table ACL" jobs clusters needed to inventory Hive Metastore Table ACLS
46-
- If your Databricks Workspace relies on an external Hive Metastore (such as glue), make sure to read the [External HMS Document](docs/external_hms_glue.md).
47-
4. [[AWS](https://docs.databricks.com/en/administration-guide/users-groups/best-practices.html)] [[Azure](https://learn.microsoft.com/en-us/azure/databricks/administration-guide/users-groups/best-practices)] [[GCP](https://docs.gcp.databricks.com/administration-guide/users-groups/best-practices.html)] Account level Identity Setup
48-
5. [[AWS](https://docs.databricks.com/en/data-governance/unity-catalog/create-metastore.html)] [[Azure](https://learn.microsoft.com/en-us/azure/databricks/data-governance/unity-catalog/create-metastore)] [[GCP](https://docs.gcp.databricks.com/data-governance/unity-catalog/create-metastore.html)] Unity Catalog Metastore Created (per region)
49-
50-
### Download & Install
51-
52-
We only support installations and upgrades through [Databricks CLI](https://docs.databricks.com/en/dev-tools/cli/index.html), as UCX requires an installation script run to make sure all the necessary and correct configurations are in place.
53-
54-
#### Installing Databricks CLI on macOS
64+
- Workspace administrator access permissions
65+
- The ability for the installer to upload Python Wheel files to DBFS and Workspace FileSystem
66+
- A PRO or Serverless SQL Warehouse
67+
- The Assessment workflow will create a legacy "No Isolation Shared" and a legacy "Table ACL" jobs clusters needed to inventory Hive Metastore Table ACLS
68+
- If your Databricks Workspace relies on an external Hive Metastore (such as AWS Glue), make sure to read the [External HMS Document](docs/external_hms_glue.md).
69+
4. A number of commands also require Databricks account administrator access permissions, e.g. `sync-workspace-info`
70+
5. [[AWS]](https://docs.databricks.com/en/administration-guide/users-groups/best-practices.html) [[Azure]](https://learn.microsoft.com/en-us/azure/databricks/administration-guide/users-groups/best-practices)] [[GCP]](https://docs.gcp.databricks.com/administration-guide/users-groups/best-practices.html) Account level Identity Setup
71+
6. [[AWS]](https://docs.databricks.com/en/data-governance/unity-catalog/create-metastore.html) [[Azure]](https://learn.microsoft.com/en-us/azure/databricks/data-governance/unity-catalog/create-metastore) [[GCP]](https://docs.gcp.databricks.com/data-governance/unity-catalog/create-metastore.html) Unity Catalog Metastore Created (per region)
72+
73+
#### Install Databricks CLI on macOS
5574
![macos_install_databricks](docs/macos_1_databrickslabsmac_installdatabricks.gif)
5675

5776
#### Install Databricks CLI via curl on Windows
5877
![winos_install_databricks](docs/winos_1_databrickslabsmac_installdatabricks.gif)
5978

79+
### Download & Install
80+
81+
We only support installations and upgrades through [Databricks CLI](https://docs.databricks.com/en/dev-tools/cli/index.html), as UCX requires an installation script run to make sure all the necessary and correct configurations are in place.
82+
6083
#### Install UCX
84+
Install UCX via Databricks CLI:
85+
```commandline
86+
databricks labs install ucx
87+
```
88+
89+
This will start an interactive installer with a number of configuration questions:
90+
- Select a workspace profile that has been defined in `~/.databrickscfg`
91+
- Provide the name of the inventory database where UCX will store the assessment results. This will be in the workspace `hive_metastore`. Defaults to `ucx`
92+
- Create a new or select an existing SQL warehouse to run assessment dashboards on. The existing warehouse must be Pro or Serverless.
93+
- Configurations for workspace local groups migration:
94+
- Provide a backup prefix. This is used to rename workspace local groups after they have been migrated. Defaults to `db-temp-`
95+
- Select a workspace local groups migration strategy. UCX offers matching by name or external ID, using a prefix/suffix, or using regex. See [this](docs/group_name_conflict.md) for more details
96+
- Provide a specific list of workspace local groups (or all groups) to be migrated.
97+
- Select a Python log level, e.g. `DEBUG`, `INFO`. Defaults to `INFO`
98+
- Provide the level of parallelism, which limit the number of concurrent operation as UCX scans the workspace. Defaults to 8.
99+
- Select whether UCX should connect to the external HMS, if a cluster policy with external HMS is detected. Defaults to no.
100+
101+
After this, UCX will be installed locally and a number of assets will be deployed in the selected workspace. These assets are available under the installation folder, i.e. `/Users/<your user>/.ucx/`
102+
61103
![macos_install_ucx](docs/macos_2_databrickslabsmac_installucx.gif)
62104

63105
#### Upgrade UCX
106+
Verify that UCX is installed
107+
```text
108+
databricks labs installed
109+
110+
Name Description Version
111+
ucx Unity Catalog Migration Toolkit (UCX) <version>
112+
```
113+
Upgrade UCX via Databricks CLI:
114+
```commandline
115+
databricks labs upgrade ucx
116+
```
117+
The prompts will be similar to [Installation](#install-ucx)
118+
64119
![macos_upgrade_ucx](docs/macos_3_databrickslabsmac_upgradeucx.gif)
65120

66121
#### Uninstall UCX
122+
Uninstall UCX via Databricks CLI:
123+
```commandline
124+
databricks labs uninstall ucx
125+
```
126+
127+
Databricks CLI will confirm a few options:
128+
- Whether you want to remove all ucx artefacts from the workspace as well. Defaults to no.
129+
- Whether you want to delete the inventory database in `hive_metastore`. Defaults to no.
130+
67131
![macos_uninstall_ucx](docs/macos_4_databrickslabsmac_uninstallucx.gif)
68132

133+
## Using UCX
134+
135+
After installation, a number of UCX workflows will be available in the workspace. `<installation_path>/README` contains further instructions and explanations of these workflows.
136+
UCX also provides a number of command line utilities accessible via `databricks labs ucx`.
137+
138+
### Executing assessment job
139+
The assessment workflow can be triggered using the Databricks UI, or via the command line
140+
```commandline
141+
databricks labs ucx ensure-assessment-run
142+
```
143+
![ucx_assessment_workflow](docs/ucx_assessment_workflow.png)
144+
145+
### Understanding assessment report
146+
147+
After UCX assessment workflow is executed, the assessment dashboard will be populated with findings and common recommendations.
148+
[This guide](docs/assessment.md) talks about them in more details.
149+
150+
### Scanning for legacy credentials and mapping access
151+
#### AWS
152+
Use to identify all instance profiles in the workspace, and map their access to S3 buckets.
153+
This requires `awscli` to be installed and configured.
154+
155+
```commandline
156+
databricks labs ucx save-aws-iam-profiles
157+
```
158+
159+
#### Azure
160+
Use to identify all storage account used by tables, identify the relevant Azure service principals and their permissions on each storage account.
161+
This requires `azure-cli` to be installed and configured.
162+
163+
```commandline
164+
databricks labs ucx save-azure-storage-accounts
165+
```
166+
167+
### Producing table mapping
168+
Use to create a table mapping CSV file, which provides the target mapping for all `hive_metastore` tables identified by the assessment workflow.
169+
This file can be reviewed offline and later will be used for table migration.
170+
171+
```commandline
172+
databricks labs ucx table-mapping
173+
```
174+
175+
### Managing cross-workspace installation
176+
When installing UCX across multiple workspaces, users needs to keep UCX configurations in sync. The below commands address that.
177+
178+
**Recommended:** An account administrator executes `sync-workspace-info` to upload the current UCX workspace configurations to all workspaces in the account where UCX is installed.
179+
UCX will prompt you to select an account profile that has been defined in `~/.databrickscfg`.
180+
181+
```commandline
182+
databricks labs ucx sync-workspace-info
183+
```
184+
185+
**Not recommended:** If an account admin is not available to execute `sync-workspace-info`, workspace admins can manually upload the current ucx workspace config to specific target workspaces.
186+
UCX will ask to confirm the current workspace name, and the ID & name of the target workspaces
187+
188+
```commandline
189+
databricks labs ucx manual-workspace-info
190+
```
191+
192+
### Validating group membership
193+
Use to validate workspace-level & account-level groups to identify any discrepancies in membership after migration.
194+
195+
```commandline
196+
databricks labs ucx validate-groups-membership
197+
```
198+
69199
## Star History
70200

71201
[![Star History Chart](https://api.star-history.com/svg?repos=databrickslabs/ucx&type=Date)](https://star-history.com/#databrickslabs/ucx)
72202

73203
## Project Support
74-
Please note that all projects in the /databrickslabs github account are provided for your exploration only, and are not formally supported by Databricks with Service Level Agreements (SLAs). They are provided AS-IS and we do not make any guarantees of any kind. Please do not submit a support ticket relating to any issues arising from the use of these projects.
204+
Please note that all projects in the databrickslabs GitHub account are provided for your exploration only, and are not formally supported by Databricks with Service Level Agreements (SLAs). They are provided AS-IS, and we do not make any guarantees of any kind. Please do not submit a support ticket relating to any issues arising from the use of these projects.
75205

76206
Any issues discovered through the use of this project should be filed as GitHub Issues on the Repo. They will be reviewed as time permits, but there are no formal SLAs for support.

0 commit comments

Comments
 (0)