Skip to content

Commit cad0692

Browse files
authored
Amazon S3 source connector: workflow run trigger (#681)
1 parent f913684 commit cad0692

File tree

2 files changed

+177
-0
lines changed

2 files changed

+177
-0
lines changed

docs.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -272,6 +272,7 @@
272272
{
273273
"group": "Tool demos",
274274
"pages": [
275+
"examplecode/tools/s3-events",
275276
"examplecode/tools/google-drive-events",
276277
"examplecode/tools/gcs-events",
277278
"examplecode/tools/jq",

examplecode/tools/s3-events.mdx

Lines changed: 176 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,176 @@
1+
---
2+
title: Amazon S3 event triggers
3+
---
4+
5+
You can use Amazon S3 events, such as adding new files to—or updating existing files within—S3 buckets, to automatically run Unstructured ETL+ workflows
6+
that rely on those buckets as sources. This enables a no-touch approach to having Unstructured automatically process new and updated files in S3 buckets as they are added or updated.
7+
8+
This example shows how to automate this process by adding an [AWS Lambda function](https://docs.aws.amazon.com/lambda/latest/dg/concepts-basics.html#gettingstarted-concepts-function) to your AWS account. This function runs
9+
whenever a new file is added to—or an existing file is updated within—the specified S3 bucket. This function then calls the [Unstructured Workflow Endpoint](/api-reference/workflow/overview) to automatically run the
10+
specified corresponding Unstructured ETL+ workflow in your Unstructured account.
11+
12+
<Note>
13+
This example uses a custom AWS Lambda function that you create and maintain.
14+
Any issues with file detection, timing, or function invocation could be related to your custom function,
15+
rather than with Unstructured. If you are getting unexpected or no results, be sure to check your custom
16+
function's Amazon CloudWatch logs first for any informational and error messages.
17+
</Note>
18+
19+
## Requirements
20+
21+
import GetStartedSimpleApiOnly from '/snippets/general-shared-text/get-started-simple-api-only.mdx'
22+
23+
To use this example, you will need the following:
24+
25+
- An Unstructured account, and an Unstructured API key for your account, as follows:
26+
27+
<GetStartedSimpleApiOnly />
28+
29+
- The Unstructured Workflow Endpoint URL for your account, as follows:
30+
31+
1. In the Unstructured UI, click **API Keys** on the sidebar.<br/>
32+
2. Note the value of the **Unstructured Workflow Endpoint** field.
33+
34+
- An S3 source connector in your Unstructured account. [Learn how](/ui/sources/s3).
35+
- Some available [destination connector](/ui/destinations/overview) in your Unstructured account.
36+
- A workflow that uses the preceding source and destination connectors. [Learn how](/ui/workflows).
37+
38+
## Step 1: Create the Lambda function
39+
40+
1. Sign in to the AWS Management Console for your account.
41+
2. Browse to and open the **Lambda** console.
42+
3. On the sidebar, click **Functions**.
43+
4. Click **Create function**.
44+
5. Select **Author from scratch**.
45+
6. For **Function name**, enter a name for your function, such as `RunUnstructuredWorkflow`.
46+
7. For **Runtime**, select **Node.js 22.x**.
47+
8. For **Architecture**, select **x86_64**.
48+
9. Under **Permissions**, expand **Change default execution role**, and make sure **Create a new role with basic Lambda permissions** is selected.
49+
10. Click **Create function**. After the function is created, the function's code and configuration settings page appears.
50+
51+
## Step 2: Add code to the function
52+
53+
1. With the function's code and configuration settings page open from the previous step, click the **Code** tab.
54+
2. In the **Code source** tile, replace the contents of the `index.mjs` file with the following code instead.
55+
56+
If the `index.mjs` file is not visible, do the following:
57+
58+
1. Show the **Explorer**: on the sidebar, click **Explorer**.
59+
2. In the **Explorer** pane, expand the function name.
60+
3. Click to open the **index.mjs** file.
61+
62+
Here is the code for the `index.mjs` file:
63+
64+
```javascript
65+
import https from 'https';
66+
67+
export const handler = async (event) => {
68+
const apiUrl = process.env.UNSTRUCTURED_API_URL;
69+
const apiKey = process.env.UNSTRUCTURED_API_KEY;
70+
71+
if (!apiUrl || !apiKey) {
72+
throw new Error('Missing UNSTRUCTURED_API_URL or UNSTRUCTURED_API_KEY environment variable or both.');
73+
}
74+
75+
const url = new URL(apiUrl);
76+
77+
const options = {
78+
hostname: url.hostname,
79+
path: url.pathname,
80+
method: 'POST',
81+
headers: {
82+
'accept': 'application/json',
83+
'unstructured-api-key': apiKey
84+
}
85+
};
86+
87+
const postRequest = () => new Promise((resolve, reject) => {
88+
const req = https.request(options, (res) => {
89+
let responseBody = '';
90+
res.on('data', (chunk) => { responseBody += chunk; });
91+
res.on('end', () => {
92+
resolve({ statusCode: res.statusCode, body: responseBody });
93+
});
94+
});
95+
req.on('error', reject);
96+
req.end();
97+
});
98+
99+
try {
100+
const response = await postRequest();
101+
console.log(`POST status: ${response.statusCode}, body: ${response.body}`);
102+
} catch (error) {
103+
console.error('Error posting to endpoint:', error);
104+
}
105+
106+
return {
107+
statusCode: 200,
108+
body: JSON.stringify('Lambda executed successfully')
109+
};
110+
};
111+
```
112+
113+
3. In **Explorer**, expand **Deploy (Undeployed Changes)**.
114+
4. Click **Deploy**.
115+
5. Click the **Configuration** tab.
116+
6. On the sidebar, click **Environment variables**.
117+
7. Click **Edit**.
118+
8. Click **Add environment variable**.
119+
9. For **Key**, enter `UNSTRUCTURED_API_URL`.
120+
10. For **Value**, enter `<unstructured-api-url>/workflows/<workflow-id>/run`. Replace the following placeholders:
121+
122+
- Replace `<unstructured-api-url>` with your Unstructured Workflow Endpoint value.
123+
- Replace `<workflow-id>` with the ID of your Unstructured workflow.
124+
125+
The **Value** should now look similar to the following:
126+
127+
```text
128+
https://platform.unstructuredapp.io/api/v1/workflows/11111111-1111-1111-1111-111111111111/run
129+
```
130+
131+
11. Click **Add environment variable** again.
132+
12. For **Key**, enter `UNSTRUCTURED_API_KEY`.
133+
13. For **Value**, enter your Unstructure API key value.
134+
14. Click **Save**.
135+
136+
## Step 3: Create the function trigger
137+
138+
1. Browse to and open the S3 console.
139+
2. Browse to and open the S3 bucket that corresponds to your S3 source connector. The bucket's settings page appears.
140+
3. Click the **Properties** tab.
141+
4. In the **Event notifications** tile, click **Create event notification**.
142+
5. In the **General configuration** tile, enter a name for your event notification, such as `UnstructuredWorkflowNotification`.
143+
6. (Optional) For **Prefix**, enter any prefix to limit the Lambda function's scope to only the specified prefix. For example, to limit the scope to only
144+
the `input/` folder within the S3 bucket, enter `input/`.
145+
146+
<Warning>
147+
AWS does not recommend reading from and writing to the same S3 bucket because of the possibility of accidentally running Lambda functions in loops.
148+
However, if you must read from and write to the same S3 bucket, AWS strongly recommends specifying a **Prefix** value. [Learn more](https://docs.aws.amazon.com/lambda/latest/dg/with-s3.html).
149+
</Warning>
150+
151+
7. (Optional) For **Suffix**, enter any file extensions to limit the Lambda function's scope to only the specified file extensions. For example, to limit the scope to only
152+
files with the `.pdf` extension, enter `.pdf`.
153+
8. In the **Event types** tile, check the box titled **All object create events (s3:ObjectCreated:\*)**.
154+
9. In the **Destination** tile, select **Lambda function** and **Choose from your Lambda functions**.
155+
10. In the **Lambda function** tile, select the Lambda function that you created earlier in Step 1.
156+
11. Click **Save changes**.
157+
158+
## Step 4: Trigger the function
159+
160+
1. With the S3 bucket's settings page open from the previous step, click the **Objects** tab.
161+
2. If you specified a **Prefix** value earlier in Step 3, then click to open the folder that corresponds to your **Prefix** value.
162+
3. Click **Upload**, and then follow the on-screen instructions to upload a file to the bucket's root. If, however, you clicked to open the folder that corresponds to your **Prefix** value instead, then follow the on-screen instructions to upload a file to that folder instead.
163+
164+
## Step 5: View the trigger results
165+
166+
1. In the Unstructured user interface for your account, click **Jobs** on the sidebar.
167+
2. In the list of jobs, click the newly running job for your workflow.
168+
3. After the job status shows **Finished**, go to your destination location to see the results.
169+
170+
## Step 6 (Optional): Delete the trigger
171+
172+
1. To stop the function from automatically being triggered whenever you add new files to&mdash;or update existing files within&mdash;the S3 bucket, browse to and open the S3 console.
173+
2. Browse to and open the bucket that corresponds to your S3 source connector. The bucket's settings page appears.
174+
3. Click the **Properties** tab.
175+
4. In the **Event notifications** tile, check the box next to the name of the event notification that you added earlier in Step 3.
176+
5. Click **Delete**, and then click **Confirm**.

0 commit comments

Comments
 (0)