Skip to content

Commit ec4073d

Browse files
authored
Google Cloud Storage source and destinations connectors: bucket access and key file instructions, add to Platform (#326)
1 parent 6a59a42 commit ec4073d

File tree

9 files changed

+63
-43
lines changed

9 files changed

+63
-43
lines changed

mint.json

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -449,6 +449,7 @@
449449
"pages": [
450450
"platform/sources/overview",
451451
"platform/sources/azure-blob-storage",
452+
"platform/sources/google-cloud",
452453
"platform/sources/s3",
453454
"platform/sources/sharepoint"
454455
]
@@ -460,6 +461,7 @@
460461
"platform/destinations/astradb",
461462
"platform/destinations/azure-cognitive-search",
462463
"platform/destinations/delta-table",
464+
"platform/destinations/google-cloud",
463465
"platform/destinations/milvus",
464466
"platform/destinations/mongodb",
465467
"platform/destinations/pinecone",

platform/connectors.mdx

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ The Unstructured Platform supports connecting to the following source and destin
1212
## Sources
1313

1414
- [Azure](/platform/sources/azure-blob-storage)
15+
- [Google Cloud Storage](/platform/sources/google-cloud)
1516
- [S3](/platform/sources/s3)
1617
- [SharePoint](/platform/sources/sharepoint)
1718

@@ -25,6 +26,7 @@ If your source is not listed here, you might still be able to connect Unstructur
2526
- [Astra DB](/platform/destinations/astradb)
2627
- [Azure Cognitive Search](/platform/destinations/azure-cognitive-search)
2728
- [Delta Table](/platform/destinations/delta-table)
29+
- [Google Cloud Storage](/platform/destinations/google-cloud)
2830
- [Milvus](/platform/destinations/milvus)
2931
- [MongoDB](/platform/destinations/mongodb)
3032
- [Pinecone](/platform/destinations/pinecone)

platform/destinations/google-cloud-storage.mdx

Lines changed: 0 additions & 35 deletions
This file was deleted.
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
---
2+
title: Google Cloud Storage
3+
---
4+
5+
Send processed data from Unstructured to Google Cloud Storage.
6+
7+
You'll need:
8+
9+
import GCSPrerequisites from '/snippets/general-shared-text/gcs.mdx';
10+
11+
<GCSPrerequisites />
12+
13+
To create the destination connector:
14+
15+
1. On the sidebar, click **Connectors**.
16+
2. Click **Destinations**.
17+
3. Click **Add new**.
18+
4. Give the connector some unique **Name**.
19+
5. In the **Provider** area, click **Google GCS**.
20+
6. Click **Continue**.
21+
7. Follow the on-screen instructions to fill in the fields as described later on this page.
22+
8. Click **Save and Test**.
23+
24+
import GCSFields from '/snippets/general-shared-text/gcs-platform.mdx';
25+
26+
<GCSFields />

platform/destinations/overview.mdx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ To create a destination connector:
2020
- [Astra DB](/platform/destinations/astradb)
2121
- [Azure Cognitive Search](/platform/destinations/azure-cognitive-search)
2222
- [Delta Table](/platform/destinations/delta-table)
23+
- [Google Cloud Storage](/platform/destinations/google-cloud)
2324
- [Milvus](/platform/destinations/milvus)
2425
- [MongoDB](/platform/destinations/mongodb)
2526
- [Pinecone](/platform/destinations/pinecone)

platform/sources/google-cloud.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ To create the source connector:
1616
2. Click **Sources**.
1717
3. Click **Add new**.
1818
4. Give the connector some unique **Name**.
19-
5. In the **Provider** area, click **Google Cloud Storage**.
19+
5. In the **Provider** area, click **Google GCS**.
2020
6. Click **Continue**.
2121
7. Follow the on-screen instructions to fill in the fields as described later on this page.
2222
8. Click **Save and Test**.

platform/sources/overview.mdx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ To create a source connector:
1919
7. Fill in the fields according to your connector type. To learn how, click your connector type in the following list:
2020

2121
- [Azure](/platform/sources/azure-blob-storage)
22+
- [Google Cloud Storage](/platform/sources/google-cloud)
2223
- [S3](/platform/sources/s3)
2324
- [SharePoint](/platform/source/sharepoint)
2425

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
Fill in the following fields:
22

33
- **Name** (_required_): A unique name for this connector.
4-
- **Remote URL** (_required_): The `gs://` URL for the Google Cloud Storage bucket and path.
5-
- **Service Account Key** (_required_): The JSON content of a Google Cloud service account key that has the necessary permissions to access the specified bucket. The service account key must have at least the **Storage Object Viewer** role to ensure proper access permissions. [Create a service account key](https://cloud.google.com/iam/docs/keys-create-delete#creating). [Assign a role](https://cloud.google.com/storage/docs/access-control/iam).
6-
- **Uncompress archives**: Check this box if the files to be ingested are compressed and require uncompression.
7-
- **Recursive processing**: Check this box to ingest data recursively from subfolders within the bucket.
4+
- **Bucket URI** (_required_): The URI for the Google Cloud Storage bucket and any target folder path within the bucket. This URI takes the format `gs://<bucket-name>[/folder-name]`.
5+
- **Service Account Key** (_required_): The contents of a service account key file, expressed as a single string without line breaks, for a Google Cloud service account that has the required access permissions to the bucket.
6+
- **Recursive**: Check this box to ingest data recursively from any subfolders, starting from the path specified by **Bucket URI**.
Lines changed: 27 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,30 @@
11
The Google Cloud Storage prerequisites:
22

3-
- A Google Cloud Storage bucket URL, beginning with `gs://`.
4-
- A Google Cloud service account key for Google Cloud Storage. The service account key must have at least the **Storage Object Viewer** role to ensure proper access permissions. [Create a service account key](https://cloud.google.com/iam/docs/keys-create-delete#creating). [Assign a role](https://cloud.google.com/storage/docs/access-control/iam).
3+
- A Google Cloud service account. [Create a service account](https://cloud.google.com/iam/docs/service-accounts-create#console).
4+
- A service account key for the service account. See [Create a service account key](https://cloud.google.com/iam/docs/keys-create-delete#creating) in
5+
[Create and delete service account keys](https://cloud.google.com/iam/docs/keys-create-delete).
56

6-
[Learn more](https://cloud.google.com/storage/docs).
7+
To ensure maximum compatibility across Unstructured service offerings, you should give the service account key information to Unstructured as
8+
a single-line string that contains the contents of the downloaded service account key file (and not the service account key file itself).
9+
To print this single-line string without line breaks, suitable for copying, you can run one of the following commands from your Terminal or Command Prompt.
10+
In this command, replace `<path-to-downloaded-key-file>` with the path to the service account key file that you downloaded by following the preceding instructions.
11+
12+
- For macOS or Linux:
13+
```text
14+
tr -d '\n' < <path-to-downloaded-key-file>
15+
```
16+
- For Windows:
17+
```text
18+
(Get-Content -Path "<path-to-downloaded-key-file>" -Raw).Replace("`r`n", "").Replace("`n", "")
19+
```
20+
21+
- The URI for a Google Cloud Storage bucket. This URI consists of the target bucket name, plus any target folder within the bucket, expressed as `gs://<bucket-name>[/folder-name]`. [Create a bucket](https://cloud.google.com/storage/docs/creating-buckets#console).
22+
23+
This bucket must have, at minimum, one of the following roles applied to the target Google Cloud service account:
24+
25+
- `Storage Object Viewer` for bucket read access.
26+
- `Storage Object Creator` for bucket write access.
27+
- The `Storage Object Admin` role provides read and write access, plus access to additional bucket operations.
28+
29+
To apply one of these roles to a service account for a bucket, see [Add a principal to a bucket-level policy](https://cloud.google.com/storage/docs/access-control/using-iam-permissions#bucket-add)
30+
in [Set and manage IAM policies on buckets](https://cloud.google.com/storage/docs/access-control/using-iam-permissions).

0 commit comments

Comments
 (0)