|
| 1 | +--- |
| 2 | +title: Amazon S3 event triggers |
| 3 | +--- |
| 4 | + |
| 5 | +You can use Amazon S3 events, such as adding new files to—or updating existing files within—S3 buckets, to automatically run Unstructured ETL+ workflows |
| 6 | +that rely on those buckets as sources. This enables a no-touch approach to having Unstructured automatically process new and updated files in S3 buckets as they are added or updated. |
| 7 | + |
| 8 | +This example shows how to automate this process by adding an [AWS Lambda function](https://docs.aws.amazon.com/lambda/latest/dg/concepts-basics.html#gettingstarted-concepts-function) to your AWS account. This function runs |
| 9 | +whenever a new file is added to—or an existing file is updated within—the specified S3 bucket. This function then calls the [Unstructured Workflow Endpoint](/api-reference/workflow/overview) to automatically run the |
| 10 | +specified corresponding Unstructured ETL+ workflow in your Unstructured account. |
| 11 | + |
| 12 | +<Note> |
| 13 | + This example uses a custom AWS Lambda function that you create and maintain. |
| 14 | + Any issues with file detection, timing, or function invocation could be related to your custom function, |
| 15 | + rather than with Unstructured. If you are getting unexpected or no results, be sure to check your custom |
| 16 | + function's Amazon CloudWatch logs first for any informational and error messages. |
| 17 | +</Note> |
| 18 | + |
| 19 | +## Requirements |
| 20 | + |
| 21 | +import GetStartedSimpleApiOnly from '/snippets/general-shared-text/get-started-simple-api-only.mdx' |
| 22 | + |
| 23 | +To use this example, you will need the following: |
| 24 | + |
| 25 | +- An Unstructured account, and an Unstructured API key for your account, as follows: |
| 26 | + |
| 27 | + <GetStartedSimpleApiOnly /> |
| 28 | + |
| 29 | +- The Unstructured Workflow Endpoint URL for your account, as follows: |
| 30 | + |
| 31 | + 1. In the Unstructured UI, click **API Keys** on the sidebar.<br/> |
| 32 | + 2. Note the value of the **Unstructured Workflow Endpoint** field. |
| 33 | + |
| 34 | +- An S3 source connector in your Unstructured account. [Learn how](/ui/sources/s3). |
| 35 | +- Some available [destination connector](/ui/destinations/overview) in your Unstructured account. |
| 36 | +- A workflow that uses the preceding source and destination connectors. [Learn how](/ui/workflows). |
| 37 | + |
| 38 | +## Step 1: Create the Lambda function |
| 39 | + |
| 40 | +1. Sign in to the AWS Management Console for your account. |
| 41 | +2. Browse to and open the **Lambda** console. |
| 42 | +3. On the sidebar, click **Functions**. |
| 43 | +4. Click **Create function**. |
| 44 | +5. Select **Author from scratch**. |
| 45 | +6. For **Function name**, enter a name for your function, such as `RunUnstructuredWorkflow`. |
| 46 | +7. For **Runtime**, select **Node.js 22.x**. |
| 47 | +8. For **Architecture**, select **x86_64**. |
| 48 | +9. Under **Permissions**, expand **Change default execution role**, and make sure **Create a new role with basic Lambda permissions** is selected. |
| 49 | +10. Click **Create function**. After the function is created, the function's code and configuration settings page appears. |
| 50 | + |
| 51 | +## Step 2: Add code to the function |
| 52 | + |
| 53 | +1. With the function's code and configuration settings page open from the previous step, click the **Code** tab. |
| 54 | +2. In the **Code source** tile, replace the contents of the `index.mjs` file with the following code instead. |
| 55 | + |
| 56 | + If the `index.mjs` file is not visible, do the following: |
| 57 | + |
| 58 | + 1. Show the **Explorer**: on the sidebar, click **Explorer**. |
| 59 | + 2. In the **Explorer** pane, expand the function name. |
| 60 | + 3. Click to open the **index.mjs** file. |
| 61 | + |
| 62 | + Here is the code for the `index.mjs` file: |
| 63 | + |
| 64 | + ```javascript |
| 65 | + import https from 'https'; |
| 66 | + |
| 67 | + export const handler = async (event) => { |
| 68 | + const apiUrl = process.env.UNSTRUCTURED_API_URL; |
| 69 | + const apiKey = process.env.UNSTRUCTURED_API_KEY; |
| 70 | + |
| 71 | + if (!apiUrl || !apiKey) { |
| 72 | + throw new Error('Missing UNSTRUCTURED_API_URL or UNSTRUCTURED_API_KEY environment variable or both.'); |
| 73 | + } |
| 74 | + |
| 75 | + const url = new URL(apiUrl); |
| 76 | + |
| 77 | + const options = { |
| 78 | + hostname: url.hostname, |
| 79 | + path: url.pathname, |
| 80 | + method: 'POST', |
| 81 | + headers: { |
| 82 | + 'accept': 'application/json', |
| 83 | + 'unstructured-api-key': apiKey |
| 84 | + } |
| 85 | + }; |
| 86 | + |
| 87 | + const postRequest = () => new Promise((resolve, reject) => { |
| 88 | + const req = https.request(options, (res) => { |
| 89 | + let responseBody = ''; |
| 90 | + res.on('data', (chunk) => { responseBody += chunk; }); |
| 91 | + res.on('end', () => { |
| 92 | + resolve({ statusCode: res.statusCode, body: responseBody }); |
| 93 | + }); |
| 94 | + }); |
| 95 | + req.on('error', reject); |
| 96 | + req.end(); |
| 97 | + }); |
| 98 | + |
| 99 | + try { |
| 100 | + const response = await postRequest(); |
| 101 | + console.log(`POST status: ${response.statusCode}, body: ${response.body}`); |
| 102 | + } catch (error) { |
| 103 | + console.error('Error posting to endpoint:', error); |
| 104 | + } |
| 105 | + |
| 106 | + return { |
| 107 | + statusCode: 200, |
| 108 | + body: JSON.stringify('Lambda executed successfully') |
| 109 | + }; |
| 110 | + }; |
| 111 | + ``` |
| 112 | + |
| 113 | +3. In **Explorer**, expand **Deploy (Undeployed Changes)**. |
| 114 | +4. Click **Deploy**. |
| 115 | +5. Click the **Configuration** tab. |
| 116 | +6. On the sidebar, click **Environment variables**. |
| 117 | +7. Click **Edit**. |
| 118 | +8. Click **Add environment variable**. |
| 119 | +9. For **Key**, enter `UNSTRUCTURED_API_URL`. |
| 120 | +10. For **Value**, enter `<unstructured-api-url>/workflows/<workflow-id>/run`. Replace the following placeholders: |
| 121 | + |
| 122 | + - Replace `<unstructured-api-url>` with your Unstructured Workflow Endpoint value. |
| 123 | + - Replace `<workflow-id>` with the ID of your Unstructured workflow. |
| 124 | + |
| 125 | + The **Value** should now look similar to the following: |
| 126 | + |
| 127 | + ```text |
| 128 | + https://platform.unstructuredapp.io/api/v1/workflows/11111111-1111-1111-1111-111111111111/run |
| 129 | + ``` |
| 130 | + |
| 131 | +11. Click **Add environment variable** again. |
| 132 | +12. For **Key**, enter `UNSTRUCTURED_API_KEY`. |
| 133 | +13. For **Value**, enter your Unstructure API key value. |
| 134 | +14. Click **Save**. |
| 135 | + |
| 136 | +## Step 3: Create the function trigger |
| 137 | + |
| 138 | +1. Browse to and open the S3 console. |
| 139 | +2. Browse to and open the S3 bucket that corresponds to your S3 source connector. The bucket's settings page appears. |
| 140 | +3. Click the **Properties** tab. |
| 141 | +4. In the **Event notifications** tile, click **Create event notification**. |
| 142 | +5. In the **General configuration** tile, enter a name for your event notification, such as `UnstructuredWorkflowNotification`. |
| 143 | +6. (Optional) For **Prefix**, enter any prefix to limit the Lambda function's scope to only the specified prefix. For example, to limit the scope to only |
| 144 | + the `input/` folder within the S3 bucket, enter `input/`. |
| 145 | + |
| 146 | + <Warning> |
| 147 | + AWS does not recommend reading from and writing to the same S3 bucket because of the possibility of accidentally running Lambda functions in loops. |
| 148 | + However, if you must read from and write to the same S3 bucket, AWS strongly recommends specifying a **Prefix** value. [Learn more](https://docs.aws.amazon.com/lambda/latest/dg/with-s3.html). |
| 149 | + </Warning> |
| 150 | + |
| 151 | +7. (Optional) For **Suffix**, enter any file extensions to limit the Lambda function's scope to only the specified file extensions. For example, to limit the scope to only |
| 152 | + files with the `.pdf` extension, enter `.pdf`. |
| 153 | +8. In the **Event types** tile, check the box titled **All object create events (s3:ObjectCreated:\*)**. |
| 154 | +9. In the **Destination** tile, select **Lambda function** and **Choose from your Lambda functions**. |
| 155 | +10. In the **Lambda function** tile, select the Lambda function that you created earlier in Step 1. |
| 156 | +11. Click **Save changes**. |
| 157 | + |
| 158 | +## Step 4: Trigger the function |
| 159 | + |
| 160 | +1. With the S3 bucket's settings page open from the previous step, click the **Objects** tab. |
| 161 | +2. If you specified a **Prefix** value earlier in Step 3, then click to open the folder that corresponds to your **Prefix** value. |
| 162 | +3. Click **Upload**, and then follow the on-screen instructions to upload a file to the bucket's root. If, however, you clicked to open the folder that corresponds to your **Prefix** value instead, then follow the on-screen instructions to upload a file to that folder instead. |
| 163 | + |
| 164 | +## Step 5: View the trigger results |
| 165 | + |
| 166 | +1. In the Unstructured user interface for your account, click **Jobs** on the sidebar. |
| 167 | +2. In the list of jobs, click the newly running job for your workflow. |
| 168 | +3. After the job status shows **Finished**, go to your destination location to see the results. |
| 169 | + |
| 170 | +## Step 6 (Optional): Delete the trigger |
| 171 | + |
| 172 | +1. To stop the function from automatically being triggered whenever you add new files to—or update existing files within—the S3 bucket, browse to and open the S3 console. |
| 173 | +2. Browse to and open the bucket that corresponds to your S3 source connector. The bucket's settings page appears. |
| 174 | +3. Click the **Properties** tab. |
| 175 | +4. In the **Event notifications** tile, check the box next to the name of the event notification that you added earlier in Step 3. |
| 176 | +5. Click **Delete**, and then click **Confirm**. |
0 commit comments