|
| 1 | +# UCX Assessment Introduction |
| 2 | +This document describes the Assessment Report generated from the UCX tools. The main assessment report includes dashlets, widgets and details of the assessment findings and common recommendations made based on the Assessment Finding (AF) Index entry. |
| 3 | + |
| 4 | +# Assessment Report Summary |
| 5 | +The Assessment Report (Main) is the output of the Databricks Labs UCX assessment workflow. This report queries the $inventory database (e.g. `ucx`) and summarizes the findings of the assessment. The link to the Assessment Report (Main) can be found in your home folder, under `.ucx` in the README.py file. The user may also directly navigate to the Assessment report by clicking on `Dashboards` icon on the left to find the Dashboard. |
| 6 | + |
| 7 | +# Assessment Widgets |
| 8 | +<img width="1655" alt="image" src="https://github.com/databrickslabs/ucx/assets/1122251/808f7c68-fcc7-4caa-bab2-03f49a382256"> |
| 9 | + |
| 10 | +## Readiness |
| 11 | +This is an overall summary of rediness detailed in the Readiness dashlet. This value is based on the ratio of findings divided by the total number of assets scanned. |
| 12 | + |
| 13 | +## Total Databases |
| 14 | +The total number of `hive_metastore` databases found during the assessment. |
| 15 | + |
| 16 | +## Metastore Crawl Failures |
| 17 | +Total number of failures encountered by the crawler while extracting metadata from the Hive Metastore and REST APIs. |
| 18 | + |
| 19 | +## Total Tables |
| 20 | +Total number of hive metastore tables discovered |
| 21 | + |
| 22 | +## Storage Locations |
| 23 | +Total number of identified storage locations based on scanning Hive Metastore tables and schemas |
| 24 | + |
| 25 | +# Assessment Widgets |
| 26 | +Assessment widgets query tables in the $inventory database and summarize or detail out findings. |
| 27 | + |
| 28 | +The second row of the report starts with "Readiness" and "Assessment Summary" |
| 29 | +<img width="1235" alt="image" src="https://github.com/databrickslabs/ucx/assets/1122251/c68194c4-4b09-4c8d-b61f-ebf57b7106c7"> |
| 30 | + |
| 31 | +## Readiness |
| 32 | +This is a rough summary of the workspace readiness to run Unity Catalog governed workloads. Each line item is the percent of compatible items divided by the total items in the class. |
| 33 | + |
| 34 | +## Assessment Summary |
| 35 | +This is a summary count, per finding type of all of the findings identified during the assessment workflow. The assessment summary will help identify areas that need focus (e.g. Tables on DBFS or Clusters that need DBR upgrades) |
| 36 | + |
| 37 | +The third row continues with "Database Summary" |
| 38 | +<img width="1220" alt="image" src="https://github.com/databrickslabs/ucx/assets/1122251/28742e33-d3e3-4eb8-832f-1edd34999fa2"> |
| 39 | + |
| 40 | +## Database Summary |
| 41 | +This is a Hive Metastore based Database by Database assessment summary along with an upgrade strategy. |
| 42 | +`In Place Sync` indicates that the `SYNC` command can be used to copy the metadata into a Unity Catalog Catalog. |
| 43 | + |
| 44 | +And the fourth row contains "External Locations" and "Mount Points" |
| 45 | +<img width="1231" alt="image" src="https://github.com/databrickslabs/ucx/assets/1122251/8a88da36-43ef-4f50-8818-6bc7e4e23758"> |
| 46 | + |
| 47 | +## External Locations |
| 48 | +Tables were scanned for `LOCATION` attributes and that list was distilled down to External Locations. In Unity Catalog, create a STORAGE CREDENTIAL that can access the External Locations, then define Unity Catalog `EXTERNAL LOCATION`s for these items. |
| 49 | + |
| 50 | +## Mount Points |
| 51 | +Mount points are popular means to provide access to external buckets / storage accounts. A more secure form in Unity Catalog are EXTERNAL LOCATIONs and VOLUMES. EXTERNAL LOCATIONs are the basis for EXTERNAL Tables, Schemas, Catalogs and VOLUMES. VOLUMES are the basis for managing files. |
| 52 | +The recommendation is to migrate Mountpoints to Either EXTERNAL LOCATIONS or VOLUMEs. The Unity Catalog Create External Location UI will prompt for mount points to assist in creating EXTERNAL LOCATIONS. |
| 53 | + |
| 54 | +Unfortunately, as of January 2024, cross cloud external locations are not supported. Databricks to Databricks delta sharing may assist in upgrading cross cloud mounts. |
| 55 | + |
| 56 | +The next row contains the "Table Types" widget |
| 57 | +<img width="1229" alt="image" src="https://github.com/databrickslabs/ucx/assets/1122251/859d7ea1-5f73-4278-9748-80ca6d94fe28"> |
| 58 | + |
| 59 | +## Table Types |
| 60 | +This widget is a detailed list of each table, it's format, storage type, location property and if a DBFS table approximate table size. Upgrade strategies include: |
| 61 | +- DEEP CLONE or CTAS for DBFS ROOT tables |
| 62 | +- SYNC for DELTA tables (managed or external) for tables stored on a non-DBFS root (Mount point or direct cloud storage path) |
| 63 | +- Managed non DELTA tables need to be upgraded to to Unity Catalog by either: |
| 64 | + - Use CTAS to convert targeting the Unity Catalog catalog, schema and table name |
| 65 | + - Moved to an EXTERNAL LOCATION and create an EXTERNAL table in Unity Catalog. |
| 66 | + |
| 67 | +The following row includes "Incompatible Clusters and "Incompatible Jobs" |
| 68 | +<img width="1248" alt="image" src="https://github.com/databrickslabs/ucx/assets/1122251/30a08de6-240c-48d1-9f49-e2c10537ccc3"> |
| 69 | + |
| 70 | +## Incompatible Clusters |
| 71 | +This widget is a list of findings (reasons) and clusters that may need upgrading. See Assessment Finding Index (below) for specific recommendations. |
| 72 | + |
| 73 | +## Incompatible Jobs |
| 74 | +This is a list of findings (reasons) and jobs that may need upgrading. See Assessment Findings Index for more information. |
| 75 | + |
| 76 | +The final row includes "Incompatible Delta Live Tables" and "Incompatible Global Init Scripts" |
| 77 | +<img width="1244" alt="image" src="https://github.com/databrickslabs/ucx/assets/1122251/c0267df9-ddb1-4519-8ba1-4c608d8eef31"> |
| 78 | + |
| 79 | +## Incompatible Delta Live Tables |
| 80 | +These are Delta Live Table jobs that may be incompatible with Unity Catalog. |
| 81 | + |
| 82 | +## Incompatible Global Init Scripts |
| 83 | +These are Global Init Scripts that are incompatible with Unity Catalog compute. As a reminder, global init scripts need to be on secure storage (Volumes or a Cloud Storage account and not DBFS) |
| 84 | + |
| 85 | +# Assessment Finding Index |
| 86 | +This section will help explain UCX Assessment findings and provide a recommended action. |
| 87 | +The assessment finding index is grouped by: |
| 88 | +- The 100 serieds findings are Databricks Runtime and compute configuration findings |
| 89 | +- The 200 series findings are centered around data related observations. |
| 90 | + |
| 91 | +### AF101 - not supported DBR: ##.#.x-scala2.12 |
| 92 | +Short description: The compute runtime does not meet the requirements to use Unity Catalog. |
| 93 | +Explanation: Unity Catalog capabilities are fully enabled on Databricks Runtime 13.3 LTS. This is the current recommended runtime for production interactive clusters and jobs. This finding is noting the cluster or job compute configuration does not meet this threshold. |
| 94 | +recommendation: Upgrade the DBR version to 13.3 LTS or later. |
| 95 | + |
| 96 | +### AF102 - not supported DBR: ##.#.x-cpu-ml-scala2.12 |
| 97 | +Currently, MLR (Machine Learning Runtime) and GPU *SHARED* clusters are not supported with Unity Catalog. Use *Assigned* or *Job* clusters instead. |
| 98 | + |
| 99 | +### AF103 - not supported DBR: ##.#.x-gpu-ml-scala2.12 |
| 100 | +Currently, MLR (Machine Learning Runtime) and GPU *SHARED* clusters are not supported with Unity Catalog. Use *Assigned* or *Job* clusters instead. |
| 101 | + |
| 102 | +### AF111 - Uses azure service principal credentials config in cluster. |
| 103 | +Azure service principles are replaced by Storage Credentials to access cloud storage accounts. |
| 104 | +Create a storage CREDENTIAL, then an EXTERNAL LOCATION and possibly external tables to provide data access. |
| 105 | +If the service principal is used to access additional azure cloud services, convert the cluster to a `Assigned` cluster type which *may* work. |
| 106 | + |
| 107 | +### AF112 - Uses azure service principal credentials config in Job cluster. |
| 108 | +Azure service principles are replaced by Storage Credentials to access cloud storage accounts. |
| 109 | +Create a storage CREDENTIAL, then an EXTERNAL LOCATION and possibly external tables to provide data access. |
| 110 | +If the service principal is used to access additional azure cloud services, convert the job cluster to a `Assigned` cluster type which *may* work. |
| 111 | + |
| 112 | +### AF113 - Uses azure service principal credentials config in pipeline. |
| 113 | +Azure service principles are replaced by Storage Credentials to access cloud storage accounts. |
| 114 | +Create a storage CREDENTIAL, then an EXTERNAL LOCATION and possibly external tables to provide data access. |
| 115 | + |
| 116 | +### AF114 - unsupported config |
| 117 | +A spark config option was found in a cluster compute definition that is incompatible with Unity Catalog based compute. The recommendation is to remove or alter the config. Additionally, Unity Catalog enabled clusters may require a different approach to the same capability. As a transition strategy, "Unassigned" clusters or "Assigned" (including job clusters but not shared clusters) may work. |
| 118 | +- `spark.hadoop.javax.jdo.option.ConnectionURL` an external Hive Metastore is in use. Recommend migrating the these tables and schemas to Unity Catalog external tables where they can be shared across workspaces. |
| 119 | + |
| 120 | +### AF115 - unsupported config: spark.databricks.passthrough.enabled |
| 121 | +Passthrough security model is not supported by Unity Catalog. Passthrough mode relied upon file based authorization which is incompatible with Fine Grained Access Controls supported by Unity Catalog. |
| 122 | +Recommend mapping your Passthrough security model to a External Location/Volume/Table/View based security model compatible with Unity Catalog. |
| 123 | + |
| 124 | +### AF201 - Inplace Sync |
| 125 | +Short description: We found that the table or database can be SYNC'd without moving data because the data is stored directly on cloud storage specified via a mount or a cloud storage URL (not DBFS). |
| 126 | +How: Run the SYNC command on the table or schema. If the tables (or source database) is 'managed' first set this spark setting in your session or in the interactive cluster configuration: `spark.databricks.sync.command.enableManagedTable=true` |
| 127 | + |
| 128 | +### AF202 - Asset Replication Required |
| 129 | +We found that the table or database needs to have the data copied into a Unity Catalog managed location or table. |
| 130 | +Recommendation: Perform a 'deep clone' operation on the table to copy the files |
| 131 | +```sql |
| 132 | +CREATE TABLE [IF NOT EXISTS] table_name |
| 133 | + [SHALLOW | DEEP] CLONE source_table_name [TBLPROPERTIES clause] [LOCATION path] |
| 134 | +``` |
| 135 | + |
| 136 | +### AF203 - Data in DBFS Root |
| 137 | +A table or schema refers to a location in DBFS and not a cloud storage location. |
| 138 | +The data must be moved from DBFS to a cloud storage location or to a Unity Catalog managed storage. |
| 139 | + |
| 140 | +### AF204 - Data is in DBFS Mount |
| 141 | +A table or schema refers to a location in DBFS mount and not a direct cloud storage location. |
| 142 | +Mounts are not suppored in Unity Catalog so the mount source location must be de-referenced and the table/schema objects mapped to a UC external location. |
| 143 | + |
| 144 | +### AF210 - Non-DELTA format: CSV |
| 145 | +Unity Catalog does not support managed CSV tables. Recommend converting the table to DELTA format or migrating the table to an External table. |
| 146 | + |
| 147 | +### AF211 - Non-DELTA format: DELTA |
| 148 | +This was a known [issue](https://github.com/databrickslabs/ucx/issues/788) of the UCX assessment job. This bug should be fixed with release `0.10.0` |
| 149 | + |
| 150 | +### AF212 - Non-DELTA format: [PARQUET|JDBC|ORC|XML|JSON|HIVE|deltaSharing|com.databricks.spark.csv|...] |
| 151 | +Unity Catalog managed tables only support DELTA format. |
| 152 | +Recommend converting the table to DELTA lake format, converting the table to an External table. |
| 153 | +For `deltaSharing` use Databricks to Databricks Delta Sharing if the provider is also on Databricks. |
| 154 | +HIVE type tables are not supported. |
| 155 | + |
| 156 | +For JDBC data sources: |
| 157 | + |
| 158 | +Problem (on shared clusters): |
| 159 | +Accessing third-party databases—other than MySQL, PostgreSQL, Amazon Redshift, Snowflake, Microsoft SQL Server, Azure Synapse (SQL Data Warehouse) and Google BigQuery—will require additional permissions on a shared cluster if the user is not a workspace admin. This is due to the drivers not guaranteeing user isolation, e.g., as the driver writes data from multiple users to a widely accessible temp directory. |
| 160 | + |
| 161 | +Workaround: |
| 162 | +Granting ANY FILE permissions will allow users to access untrusted databases. Note that ANY FILE will still enforce ACLs on any tables or external (storage) locations governed by Unity Catalog. |
| 163 | +Upgrade the DBR runtime to 13.3 LTS or higher to avoid cluster level firewall restrictions. |
| 164 | + |
| 165 | + |
| 166 | +### AF221 - Unsupported Storage Type: [adl:// | wasb:// | wasbs://] |
| 167 | +ADLS Gen 2 (`abfss://`) is the only Azure native storage type supported. Use a Deep Clone process to copy the table data. |
| 168 | +```sql |
| 169 | +CREATE TABLE [IF NOT EXISTS] table_name |
| 170 | + [SHALLOW | DEEP] CLONE source_table_name [TBLPROPERTIES clause] [LOCATION path] |
| 171 | +``` |
| 172 | + |
| 173 | + |
| 174 | + |
| 175 | +# Common Terms |
| 176 | +- **UC** |
| 177 | +Abbreviation for Unity Catalog |
| 178 | +- **DELTA** |
| 179 | +DELTA refers to the table format for Delta Lake tables. |
| 180 | +- **CTAS** |
| 181 | +Abbreviation for *Create Table As Select* which is a method of copying table data from one source to another. The CREATE statement can include USING and LOCATION keywords while the SELECT portion can cast columns to other data types. |
| 182 | +- **DEEP CLONE** |
| 183 | +Is short hand for CREATE TABLE DEEP CLONE <target table> <source table> which only works for DELTA formatted tables. |
| 184 | +- **EXTERNAL LOCATION** |
| 185 | +[EXTERNAL LOCATION]([url](https://docs.databricks.com/en/connect/unity-catalog/external-locations.html#create-an-external-location)) is a UC object type describing a url to a cloud storage bucket + folder or storage account + container and folder |
| 186 | +- **STORAGE CREDENTIAL** |
| 187 | +[STORAGE CREDENTIAL]([url](https://docs.databricks.com/en/sql/language-manual/sql-ref-storage-credentials.html)https://docs.databricks.com/en/sql/language-manual/sql-ref-storage-credentials.html) are a UC object encapsulating the credentials necessary to access cloud storage. |
| 188 | + |
0 commit comments