|
61 | 61 | [GCP](https://docs.gcp.databricks.com/tables/managed.html) |
62 | 62 | within that schema (formerly known as a database). |
63 | 63 |
|
64 | | - <Note> |
65 | | - Using dashes (`-`) in the names of catalogs, schemas (formerly known as databases), and tables might cause isolated issues with the connector. It is |
66 | | - recommended to use underscores (`_`) instead of dashes in the names of catalogs, schemas, and tables. |
67 | | - </Note> |
| 64 | + You can have the connector attempt to create a table for you automatically at run time. To do this, in the connector settings as described later in this article, |
| 65 | + do one of the following: |
| 66 | + |
| 67 | + - Specify the name of the table that you want the connector to attempt to create within the specified catalog and schema (formerly known as a database). |
| 68 | + - Leave the table name blank. The connector will attempt to create a table within the specified catalog and schema (formerly known as a database). |
| 69 | + For the [Unstructured UI](/ui/overview) and [Unstructured API](/api-reference/overview), the table is named `u<short-workflow-id>`. |
| 70 | + For the [Unstructured Ingest CLI and Ingest Python library](/ingestion/overview), the table is named `unstructuredautocreated`. |
| 71 | + |
| 72 | + The connector will attempt to create the table on behalf of the related Databricks workspace user or Databricks managed service principal that is referenced in the connector settings, as described later in these requirements. |
| 73 | + If successful, the table's owner is set as the related Databricks workspace user or Databricks managed service principal. The owner will have all Unity Catalog |
| 74 | + privileges on the table by default. No other Databricks workspace users or Databricks managed service principals will have any privileges on the table by default. |
| 75 | + |
| 76 | + <Warning> |
| 77 | + If the table's parent schema (formerly known as a database) is not owned by the same Databricks workspace user or Databricks managed service principal that is |
| 78 | + referenced in the connector settings, then you should grant the new table's owner the `CREATE TABLE` privilege on that parent schema (formerly known as a database) |
| 79 | + before the connector attempts to create the table. Otherwise, table creation could fail. |
| 80 | + </Warning> |
| 81 | + |
| 82 | + <Note> |
| 83 | + Using dashes (`-`) in the names of catalogs, schemas (formerly known as databases), and tables might cause isolated issues with the connector. It is |
| 84 | + recommended to use underscores (`_`) instead of dashes in the names of catalogs, schemas, and tables. |
| 85 | + </Note> |
68 | 86 |
|
69 | 87 | The following video shows how to create a catalog, schema (formerly known as a database), and a table in Unity Catalog if you do not already have them available, and set privileges for someone other than their owner to use them: |
70 | 88 |
|
|
78 | 96 | allowfullscreen |
79 | 97 | ></iframe> |
80 | 98 |
|
81 | | - This table must contain the following column names and their data types: |
| 99 | + If you want to use an existing table or create one yourself beforehand, this table must contain at minimum the following column names and their data types: |
82 | 100 |
|
83 | 101 | ```text |
84 | | - CREATE TABLE IF NOT EXISTS `<catalog_name>`.`<schema_name>`.elements ( |
| 102 | + CREATE TABLE IF NOT EXISTS <catalog_name>.<schema_name>.<table_name> ( |
85 | 103 | id STRING NOT NULL PRIMARY KEY, |
86 | | - record_id STRING, |
87 | | - element_id STRING, |
| 104 | + record_id STRING NOT NULL, |
| 105 | + element_id STRING NOT NULL, |
88 | 106 | text STRING, |
89 | 107 | embeddings ARRAY<FLOAT>, |
90 | 108 | type STRING, |
91 | | - date_created TIMESTAMP, |
92 | | - date_modified TIMESTAMP, |
93 | | - date_processed TIMESTAMP, |
94 | | - permissions_data STRING, |
95 | | - filesize_bytes FLOAT, |
96 | | - url STRING, |
97 | | - version STRING, |
98 | | - record_locator STRING, |
99 | | - category_depth DOUBLE, |
100 | | - parent_id STRING, |
101 | | - attached_filename STRING, |
102 | | - filetype STRING, |
103 | | - last_modified TIMESTAMP, |
104 | | - file_directory STRING, |
105 | | - filename STRING, |
106 | | - languages ARRAY<STRING>, |
107 | | - page_number STRING, |
108 | | - links STRING, |
109 | | - page_name STRING, |
110 | | - link_urls STRING, |
111 | | - link_texts STRING, |
112 | | - sent_from STRING, |
113 | | - sent_to STRING, |
114 | | - subject STRING, |
115 | | - section STRING, |
116 | | - header_footer_type STRING, |
117 | | - emphasized_text_contents STRING, |
118 | | - emphasized_text_tags STRING, |
119 | | - text_as_html STRING, |
120 | | - regex_metadata STRING, |
121 | | - detection_class_prob FLOAT, |
122 | | - is_continuation BOOLEAN, |
123 | | - orig_elements STRING, |
124 | | - coordinates_points STRING, |
125 | | - coordinates_system STRING, |
126 | | - coordinates_layout_width FLOAT, |
127 | | - coordinates_layout_height FLOAT, |
128 | | - partitioner_type STRING, |
129 | | - image_mime_type STRING, |
130 | | - image_base64 STRING |
| 109 | + metadata VARIANT |
131 | 110 | ); |
132 | 111 | ``` |
133 | 112 |
|
|
208 | 187 | ></iframe> |
209 | 188 |
|
210 | 189 | - The Databricks workspace user or Databricks managed service principal must have the following _minimum_ set of permissions and privileges to write to an |
211 | | - existing volume or table in Unity Catalog: |
| 190 | + existing volume or table in Unity Catalog. If the owner of these is that Databricks workspace user or Databricks managed service principal, then |
| 191 | + they will have all necessary permissions and privileges by default. If the owner is someone else, then the following permissions and privileges must be |
| 192 | + explicitly granted to them before using the connector: |
212 | 193 |
|
213 | 194 | - To use an all-purpose cluster for access, `Can Restart` permission on that cluster. Learn how to check and set cluster permissions for |
214 | 195 | [AWS](https://docs.databricks.com/compute/clusters-manage.html#compute-permissions), |
|
233 | 214 |
|
234 | 215 | - `USE CATALOG` on the table's parent catalog in Unity Catalog. |
235 | 216 | - `USE SCHEMA` on the tables's parent schema (formerly known as a database) in Unity Catalog. |
236 | | - - `MODIFY` and `SELECT` on the table. |
| 217 | + - To create a new table, `CREATE TABLE` on the table's parent schema (formerly known as a database) in Unity Catalog. |
| 218 | + - If the table already exists, `MODIFY` and `SELECT` on the table. |
237 | 219 |
|
238 | 220 | Learn how to check and set Unity Catalog privileges for |
239 | 221 | [AWS](https://docs.databricks.com/data-governance/unity-catalog/manage-privileges/index.html#show-grant-and-revoke-privileges), |
|
0 commit comments