Skip to content

Commit 932e120

Browse files
authored
Platform: Databricks Volumes connectors (#327)
1 parent e7e0cfb commit 932e120

File tree

5 files changed

+85
-36
lines changed

5 files changed

+85
-36
lines changed

mint.json

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -449,6 +449,7 @@
449449
"pages": [
450450
"platform/sources/overview",
451451
"platform/sources/azure-blob-storage",
452+
"platform/sources/databricks-volumes",
452453
"platform/sources/google-cloud",
453454
"platform/sources/s3",
454455
"platform/sources/sharepoint"
@@ -460,6 +461,7 @@
460461
"platform/destinations/overview",
461462
"platform/destinations/astradb",
462463
"platform/destinations/azure-cognitive-search",
464+
"platform/destinations/databricks-volumes",
463465
"platform/destinations/delta-table",
464466
"platform/destinations/google-cloud",
465467
"platform/destinations/milvus",

platform/destinations/databricks.mdx renamed to platform/destinations/databricks-volumes.mdx

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,21 +6,21 @@ Send processed data from Unstructured to Databricks Volumes.
66

77
You'll need:
88

9-
import DatabricksPrerequisites from '/snippets/general-shared-text/databricks-volumes.mdx';
9+
import DatabricksVolumesPrerequisites from '/snippets/general-shared-text/databricks-volumes.mdx';
1010

11-
<DatabricksPrerequisites />
11+
<DatabricksVolumesPrerequisites />
1212

1313
To create the destination connector:
1414

1515
1. On the sidebar, click **Connectors**.
1616
2. Click **Destinations**.
1717
3. Click **Add new**.
1818
4. Give the connector some unique **Name**.
19-
5. In the **Provider** area, click **Databricks**.
19+
5. In the **Provider** area, click **Databricks Volumes**.
2020
6. Click **Continue**.
2121
7. Follow the on-screen instructions to fill in the fields as described later on this page.
2222
8. Click **Save and Test**.
2323

24-
import DatabricksFields from '/snippets/general-shared-text/databricks-volumes-platform.mdx';
24+
import DatabricksVolumesFields from '/snippets/general-shared-text/databricks-volumes-platform.mdx';
2525

26-
<DatabricksFields />
26+
<DatabricksVolumesFields />
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
---
2+
title: Databricks Volumes
3+
---
4+
5+
Ingest your files into Unstructured from Databricks Volumes.
6+
7+
You'll need:
8+
9+
import DatabricksVolumesPrerequisites from '/snippets/general-shared-text/databricks-volumes.mdx';
10+
11+
<DatabricksVolumesPrerequisites />
12+
13+
To create the source connector:
14+
15+
1. On the sidebar, click **Connectors**.
16+
2. Click **Sources**.
17+
3. Click **Add new**.
18+
4. Give the connector some unique **Name**.
19+
5. In the **Provider** area, click **Databricks Volumes**.
20+
6. Click **Continue**.
21+
7. Follow the on-screen instructions to fill in the fields as described later on this page.
22+
8. Click **Save and Test**.
23+
24+
import DatabricksVolumesFields from '/snippets/general-shared-text/databricks-volumes-platform.mdx';
25+
26+
<DatabricksVolumesFields />

snippets/general-shared-text/databricks-volumes-platform.mdx

Lines changed: 12 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -2,35 +2,25 @@ Fill in the following fields:
22

33
- **Name** (_required_): A unique name for this connector.
44
- **Host** (_required_): The Databricks workspace host URL.
5-
- **Cluster ID** : The Databricks cluster ID.
65
- **Catalog** (_required_): The name of the catalog to use.
76
- **Schema** : The name of the associated schema. If not specified, **default** is used.
87
- **Volume** (_required_): The name of the associated volume.
98
- **Volume Path** : Any optional path to access within the volume.
10-
- **Overwrite** Check this box if existing data should be overwritten.
11-
- **Encoding** : Any encoding to be applied to the data in the volume. If not specified, **utf-8**, is used.
9+
- **Client ID** (_required_): The application ID value for the Databricks-managed service principal that has access to the volume.
10+
- **Client Secret** (_required_): The associated OAuth secret value for the Databricks-managed service principal that has access to the volume.
1211

13-
Also fill in the following fields based on your authentication type, depending on your cloud provider:
12+
To learn how to create a Databricks-managed service principal, get its application ID, and generate an associated OAuth secret,
13+
see the documentation for
14+
[AWS](https://docs.databricks.com/dev-tools/auth/oauth-m2m.html),
15+
[Azure](https://learn.microsoft.com/databricks/dev-tools/auth/oauth-m2m),
16+
or [GCP](https://docs.gcp.databricks.com/dev-tools/auth/oauth-m2m.html).
1417

15-
- For Databricks personal access token authentication (AWS, Azure, and GCP):
18+
For Azure, only Databricks-managed service principals are supported. Microsoft Entra ID-managed service principals are not supported.
1619

17-
- **Token** : The Databricks personal access token value.
18-
19-
- For username and password (basic) authentication (AWS only):
20-
21-
- **Username** : The Databricks username value.
22-
- **Password** : The associated Databricks password value.
23-
24-
The following authentication types are currently not supported:
25-
26-
- OAuth machine-to-machine (M2M) authentication (AWS, Azure, and GCP).
27-
- OAuth user-to-machine (U2M) authentication (AWS, Azure, and GCP).
28-
- Azure managed identities (MSI) authentication (Azure only).
29-
- Microsoft Entra ID service principal authentication (Azure only).
30-
- Azure CLI authentication (Azure only).
31-
- Microsoft Entra ID user authentication (Azure only).
32-
- Google Cloud Platform credentials authentication (GCP only).
33-
- Google Cloud Platform ID authentication (GCP only).
20+
To learn how to grant a Databricks-managed service principal access to a volume, see the documentation for
21+
[AWS](https://docs.databricks.com/volumes/utility-commands.html#change-permissions-on-a-volume),
22+
[Azure](https://learn.microsoft.com/azure/databricks/volumes/utility-commands#change-permissions-on-a-volume),
23+
or [GCP](https://docs.gcp.databricks.com/volumes/utility-commands.html#change-permissions-on-a-volume).
3424

3525

3626

snippets/general-shared-text/databricks-volumes.mdx

Lines changed: 40 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,11 @@ allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; pic
1010
allowfullscreen
1111
></iframe>
1212

13+
The preceding video shows how to use Databricks personal access tokens (PATs), which are supported only for [Unstructured Ingest](/ingestion/overview).
14+
15+
To learn how to use Databricks-managed service principals, which are supported by both the [Unstructured Platform](/platform/overview) and Unstructured Ingest,
16+
see the additional videos later on this page.
17+
1318
- The Databricks workspace URL. Get the workspace URL for
1419
[AWS](https://docs.databricks.com/workspace/workspace-details.html#workspace-instance-names-urls-and-ids),
1520
[Azure](https://learn.microsoft.com/azure/databricks/workspace/workspace-details#workspace-instance-names-urls-and-ids),
@@ -21,17 +26,39 @@ allowfullscreen
2126
- Azure: `https://adb-<workspace-id>.<random-number>.azuredatabricks.net`
2227
- GCP: `https://<workspace-id>.<random-number>.gcp.databricks.com`
2328

24-
- The Databricks compute resource's ID. Get the compute resource ID for
25-
[AWS](https://docs.databricks.com/integrations/compute-details.html),
26-
[Azure](https://learn.microsoft.com/azure/databricks/integrations/compute-details),
27-
or [GCP](https://docs.gcp.databricks.com/integrations/compute-details.html).
28-
2929
- The Databricks authentication details. For more information, see the documentation for
3030
[AWS](https://docs.databricks.com/dev-tools/auth/index.html),
3131
[Azure](https://learn.microsoft.com/azure/databricks/dev-tools/auth/),
3232
or [GCP](https://docs.gcp.databricks.com/dev-tools/auth/index.html).
3333

34-
More specifically, you will need:
34+
The following videos show how to create a Databricks-managed service principal and then grant it access to a Databricks volume:
35+
36+
<iframe
37+
width="560"
38+
height="315"
39+
src="https://www.youtube.com/embed/wBmqv5DaA1E"
40+
title="YouTube video player"
41+
frameborder="0"
42+
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
43+
allowfullscreen
44+
></iframe>
45+
46+
<iframe
47+
width="560"
48+
height="315"
49+
src="https://www.youtube.com/embed/DykQRxgh2aQ"
50+
title="YouTube video player"
51+
frameborder="0"
52+
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
53+
allowfullscreen
54+
></iframe>
55+
56+
For the [Unstructured Platform](/platform/overview), only the following Databricks authentication type is supported:
57+
58+
- For OAuth machine-to-machine (M2M) authentication (AWS, Azure, and GCP): The client ID and OAuth secret values for the corresponding service principal.
59+
Note that for Azure, only Databricks-managed service principals are supported. Microsoft Entra ID-managed service principals are not supported.
60+
61+
For [Unstructured Ingest](/ingestion/overview), the following Databricks authentication types are supported:
3562

3663
- For Databricks personal access token authentication (AWS, Azure, and GCP): The personal access token's value.
3764
- For username and password (basic) authentication (AWS only): The user's name and password values.
@@ -44,6 +71,10 @@ allowfullscreen
4471
- For Google Cloud Platform credentials authentication (GCP only): The local path to the corresponding Google Cloud service account's credentials file.
4572
- For Google Cloud Platform ID authentication (GCP only): The Google Cloud service account's email address.
4673

47-
- The Databricks catalog name for the Volume. Get the catalog name for [AWS](https://docs.databricks.com/catalogs/manage-catalog.html), [Azure](https://learn.microsoft.com/azure/databricks/catalogs/manage-catalog), or [GCP](https://docs.gcp.databricks.com/catalogs/manage-catalog.html).
48-
- The Databricks schema name for the Volume. Get the schema name for [AWS](https://docs.databricks.com/schemas/manage-schema.html), [Azure](https://learn.microsoft.com/azure/databricks/schemas/manage-schema), or [GCP](https://docs.gcp.databricks.com/schemas/manage-schema.html).
49-
- The Databricks Volume name, and optionally any path in that Volume that you want to access directly. Get the Volume information for [AWS](https://docs.databricks.com/files/volumes.html), [Azure](https://learn.microsoft.com/azure/databricks/files/volumes), or [GCP](https://docs.gcp.databricks.com/files/volumes.html).
74+
- The Databricks catalog name for the volume. Get the catalog name for [AWS](https://docs.databricks.com/catalogs/manage-catalog.html), [Azure](https://learn.microsoft.com/azure/databricks/catalogs/manage-catalog), or [GCP](https://docs.gcp.databricks.com/catalogs/manage-catalog.html).
75+
- The Databricks schema name for the volume. Get the schema name for [AWS](https://docs.databricks.com/schemas/manage-schema.html), [Azure](https://learn.microsoft.com/azure/databricks/schemas/manage-schema), or [GCP](https://docs.gcp.databricks.com/schemas/manage-schema.html).
76+
- The Databricks volume name, and optionally any path in that volume that you want to access directly. Get the volume information for [AWS](https://docs.databricks.com/files/volumes.html), [Azure](https://learn.microsoft.com/azure/databricks/files/volumes), or [GCP](https://docs.gcp.databricks.com/files/volumes.html).
77+
- Make sure that the target user or service principal has access to the target volume. To learn more, see the documentation for
78+
[AWS](https://docs.databricks.com/volumes/utility-commands.html#change-permissions-on-a-volume),
79+
[Azure](https://learn.microsoft.com/azure/databricks/volumes/utility-commands#change-permissions-on-a-volume),
80+
or [GCP](https://docs.gcp.databricks.com/volumes/utility-commands.html#change-permissions-on-a-volume).

0 commit comments

Comments
 (0)