Skip to content

Commit 430db53

Browse files
authored
S3 source connectors: how to work with user-defined metadata (#696)
1 parent e7aa618 commit 430db53

File tree

7 files changed

+166
-64
lines changed

7 files changed

+166
-64
lines changed

api-reference/workflow/destinations/s3.mdx

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,26 @@ import s3Prerequisites from '/snippets/general-shared-text/s3.mdx';
1414

1515
<s3Prerequisites />
1616

17+
## Add an access policy to an existing bucket
18+
19+
import S3BucketPolicy from '/snippets/general-shared-text/s3-bucket-policy.mdx';
20+
21+
<S3BucketPolicy />
22+
23+
## Create a bucket with AWS CloudFormation
24+
25+
import S3BucketCloudFormation from '/snippets/general-shared-text/s3-cf-setup.mdx';
26+
27+
<S3BucketCloudFormation />
28+
29+
## Create a bucket with the AWS CLI
30+
31+
import S3BucketCLI from '/snippets/general-shared-text/s3-cli-setup.mdx';
32+
33+
<S3BucketCLI />
34+
35+
## Create the destination connector
36+
1737
To create an S3 destination connector, see the following examples.
1838

1939
import s3SDK from '/snippets/destination_connectors/s3_sdk.mdx';

api-reference/workflow/sources/s3.mdx

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,32 @@ import S3Prerequisites from '/snippets/general-shared-text/s3.mdx';
1414

1515
<S3Prerequisites />
1616

17+
## Add an access policy to an existing bucket
18+
19+
import S3BucketPolicy from '/snippets/general-shared-text/s3-bucket-policy.mdx';
20+
21+
<S3BucketPolicy />
22+
23+
## Create a bucket with AWS CloudFormation
24+
25+
import S3BucketCloudFormation from '/snippets/general-shared-text/s3-cf-setup.mdx';
26+
27+
<S3BucketCloudFormation />
28+
29+
## Create a bucket with the AWS CLI
30+
31+
import S3BucketCLI from '/snippets/general-shared-text/s3-cli-setup.mdx';
32+
33+
<S3BucketCLI />
34+
35+
## Work with user-defined metadata
36+
37+
import S3Metadata from '/snippets/general-shared-text/s3-metadata.mdx';
38+
39+
<S3Metadata />
40+
41+
## Create the source connector
42+
1743
To create an S3 source connector, see the following examples.
1844

1945
import S3SDK from '/snippets/source_connectors/s3_sdk.mdx';

open-source/ingestion/destination-connectors/s3.mdx

Lines changed: 18 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -10,22 +10,6 @@ import SharedS3 from '/snippets/dc-shared-text/s3-cli-api.mdx';
1010

1111
<SharedS3 />
1212

13-
Now call the Unstructured Ingest CLI or the Unstructured Ingest Python library. The source connector can be any of the ones supported. This example uses the local source connector.
14-
15-
This example sends files to Unstructured for processing by default. To process files locally instead, see the instructions at the end of this page.
16-
17-
import S3APISh from '/snippets/destination_connectors/s3.sh.mdx';
18-
import S3APIPyV2 from '/snippets/destination_connectors/s3.v2.py.mdx';
19-
20-
<CodeGroup>
21-
<S3APISh />
22-
<S3APIPyV2 />
23-
</CodeGroup>
24-
25-
import SharedPartitionByAPIOSS from '/snippets/ingest-configuration-shared/partition-by-api-oss.mdx';
26-
27-
<SharedPartitionByAPIOSS/>
28-
2913
## Add an access policy to an existing bucket
3014

3115
import S3BucketPolicy from '/snippets/general-shared-text/s3-bucket-policy.mdx';
@@ -43,3 +27,21 @@ import S3BucketCLI from '/snippets/general-shared-text/s3-cli-setup.mdx';
4327
import S3BucketCloudFormation from '/snippets/general-shared-text/s3-cf-setup.mdx';
4428

4529
<S3BucketCloudFormation />
30+
31+
## Create a pipeline that uses S3 as the destination
32+
33+
Now call the Unstructured Ingest CLI or the Unstructured Ingest Python library. The source connector can be any of the ones supported. This example uses the local source connector.
34+
35+
This example sends files to Unstructured for processing by default. To process files locally instead, see the instructions at the end of this page.
36+
37+
import S3APISh from '/snippets/destination_connectors/s3.sh.mdx';
38+
import S3APIPyV2 from '/snippets/destination_connectors/s3.v2.py.mdx';
39+
40+
<CodeGroup>
41+
<S3APISh />
42+
<S3APIPyV2 />
43+
</CodeGroup>
44+
45+
import SharedPartitionByAPIOSS from '/snippets/ingest-configuration-shared/partition-by-api-oss.mdx';
46+
47+
<SharedPartitionByAPIOSS/>

open-source/ingestion/source-connectors/s3.mdx

Lines changed: 26 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,32 @@ import SharedContentS3 from '/snippets/sc-shared-text/s3-cli-api.mdx';
1010

1111
<SharedContentS3/>
1212

13+
## Add an access policy to an existing bucket
14+
15+
import S3BucketPolicy from '/snippets/general-shared-text/s3-bucket-policy.mdx';
16+
17+
<S3BucketPolicy />
18+
19+
## Create a bucket with the AWS CLI
20+
21+
import S3BucketCLI from '/snippets/general-shared-text/s3-cli-setup.mdx';
22+
23+
<S3BucketCLI />
24+
25+
## Create a bucket with AWS CloudFormation
26+
27+
import S3BucketCloudFormation from '/snippets/general-shared-text/s3-cf-setup.mdx';
28+
29+
<S3BucketCloudFormation />
30+
31+
## Work with user-defined metadata
32+
33+
import S3Metadata from '/snippets/general-shared-text/s3-metadata.mdx';
34+
35+
<S3Metadata />
36+
37+
## Create a pipeline that uses S3 as the source
38+
1339
Now call the Unstructured Ingest CLI or the Unstructured Ingest Python library. The destination connector can be any of the ones supported.
1440

1541
<iframe
@@ -38,20 +64,3 @@ import SharedPartitionByAPIOSS from '/snippets/ingest-configuration-shared/parti
3864

3965
<SharedPartitionByAPIOSS/>
4066

41-
## Add an access policy to an existing bucket
42-
43-
import S3BucketPolicy from '/snippets/general-shared-text/s3-bucket-policy.mdx';
44-
45-
<S3BucketPolicy />
46-
47-
## Create a bucket with the AWS CLI
48-
49-
import S3BucketCLI from '/snippets/general-shared-text/s3-cli-setup.mdx';
50-
51-
<S3BucketCLI />
52-
53-
## Create a bucket with AWS CloudFormation
54-
55-
import S3BucketCloudFormation from '/snippets/general-shared-text/s3-cf-setup.mdx';
56-
57-
<S3BucketCloudFormation />
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
User-defined metadata in S3 is metadata that you can choose to set at the time that you upload a file to an S3 bucket.
2+
User-defined metadata is specified in S3 as a set of name-value pairs. Each name-value pair begins with `x-amz-meta-` and is
3+
followed by a unique name.
4+
5+
For more information about how to add or replace user-defined metadata for a file in S3, see the following:
6+
7+
- [Working with object metadata](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingMetadata.html)
8+
- [User-defined metadata](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingMetadata.html#UserMetadata)
9+
- [Editing object metadata in the Amazon S3 console](https://docs.aws.amazon.com/AmazonS3/latest/userguide/add-object-metadata.html)
10+
11+
Unstructured outputs any user-defined metadata that it finds for a file into the `metadata.data_source.record_locator.metadata` field of
12+
the document elements' output for the corresponding file. For example, if Unstructured processes a file with the user-defined metadata
13+
`x-amz-meta-mymetadata` name set to the value `myvalue`, Unstructured outputs the following into the `metadata.data_source.record_locator.metadata` field of
14+
the document elements' output for the corresponding file:
15+
16+
```json
17+
[
18+
{
19+
"type": "...",
20+
"element_id": "...",
21+
"text": "...",
22+
"metadata": {
23+
"data_source": {
24+
"record_locator": {
25+
"metadata": {
26+
"mymetadata": "myvalue"
27+
}
28+
}
29+
}
30+
}
31+
}
32+
]
33+
```

ui/destinations/s3.mdx

Lines changed: 19 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -14,21 +14,6 @@ import S3Prerequisites from '/snippets/general-shared-text/s3.mdx';
1414

1515
<S3Prerequisites />
1616

17-
To create the destination connector:
18-
19-
1. On the sidebar, click **Connectors**.
20-
2. Click **Destinations**.
21-
3. Cick **New** or **Create Connector**.
22-
4. Give the connector some unique **Name**.
23-
5. In the **Provider** area, click **Amazon S3**.
24-
6. Click **Continue**.
25-
7. Follow the on-screen instructions to fill in the fields as described later on this page.
26-
8. Click **Save and Test**.
27-
28-
import S3Fields from '/snippets/general-shared-text/s3-platform.mdx';
29-
30-
<S3Fields />
31-
3217
## Add an access policy to an existing bucket
3318

3419
import S3BucketPolicy from '/snippets/general-shared-text/s3-bucket-policy.mdx';
@@ -47,3 +32,22 @@ import S3BucketCLI from '/snippets/general-shared-text/s3-cli-setup.mdx';
4732

4833
<S3BucketCLI />
4934

35+
## Create the destination connector
36+
37+
To create the destination connector:
38+
39+
1. On the sidebar, click **Connectors**.
40+
2. Click **Destinations**.
41+
3. Click **New** or **Create Connector**.
42+
4. Give the connector some unique **Name**.
43+
5. In the **Provider** area, click **Amazon S3**.
44+
6. Click **Continue**.
45+
7. Follow the on-screen instructions to fill in the fields as described later on this page.
46+
8. Click **Save and Test**.
47+
48+
import S3Fields from '/snippets/general-shared-text/s3-platform.mdx';
49+
50+
<S3Fields />
51+
52+
53+

ui/sources/s3.mdx

Lines changed: 24 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -14,21 +14,6 @@ import S3Prerequisites from '/snippets/general-shared-text/s3.mdx';
1414

1515
<S3Prerequisites />
1616

17-
To create the source connector:
18-
19-
1. On the sidebar, click **Connectors**.
20-
2. Click **Sources**.
21-
3. Cick **New** or **Create Connector**.
22-
4. Give the connector some unique **Name**.
23-
5. In the **Provider** area, click **Amazon S3**.
24-
6. Click **Continue**.
25-
7. Follow the on-screen instructions to fill in the fields as described later on this page.
26-
8. Click **Save and Test**.
27-
28-
import S3Fields from '/snippets/general-shared-text/s3-platform.mdx';
29-
30-
<S3Fields />
31-
3217
## Add an access policy to an existing bucket
3318

3419
import S3BucketPolicy from '/snippets/general-shared-text/s3-bucket-policy.mdx';
@@ -45,4 +30,27 @@ import S3BucketCloudFormation from '/snippets/general-shared-text/s3-cf-setup.md
4530

4631
import S3BucketCLI from '/snippets/general-shared-text/s3-cli-setup.mdx';
4732

48-
<S3BucketCLI />
33+
<S3BucketCLI />
34+
35+
## Work with user-defined metadata
36+
37+
import S3Metadata from '/snippets/general-shared-text/s3-metadata.mdx';
38+
39+
<S3Metadata />
40+
41+
## Create the source connector
42+
43+
To create the source connector:
44+
45+
1. On the sidebar, click **Connectors**.
46+
2. Click **Sources**.
47+
3. Click **New** or **Create Connector**.
48+
4. Give the connector some unique **Name**.
49+
5. In the **Provider** area, click **Amazon S3**.
50+
6. Click **Continue**.
51+
7. Follow the on-screen instructions to fill in the fields as described later on this page.
52+
8. Click **Save and Test**.
53+
54+
import S3Fields from '/snippets/general-shared-text/s3-platform.mdx';
55+
56+
<S3Fields />

0 commit comments

Comments
 (0)