|
| 1 | +--- |
| 2 | +title: Google Cloud Storage event triggers |
| 3 | +--- |
| 4 | + |
| 5 | +You can use Google Cloud Storage events, such as adding new files to—or updating existing files within—Google Cloud Storage buckets, to automatically run Unstructured ETL+ workflows |
| 6 | +that rely on those buckets as sources. This enables a no-touch approach to having Unstructured automatically process new and updated files in Google Cloud Storage as they are added or updated. |
| 7 | + |
| 8 | +This example shows how to automate this process by adding a custom [Google Apps Script](https://developers.google.com/apps-script) project in your Google account. This project runs |
| 9 | +a script on a regular time interval. This script automatically checks for new or updated files within the specified Google Cloud Storage bucket. If the script |
| 10 | +detects at least one new or updated file, it then calls the [Unstructured Workflow Endpoint](/api-reference/workflow/overview) to automatically run the |
| 11 | +specified corresponding Unstructured ETL+ workflow in your Unstructured account. |
| 12 | + |
| 13 | +<Note> |
| 14 | + This example uses a custom Google Apps Script that you create and maintain. |
| 15 | + Any issues with file detection, timing, or script execution could be related to your custom script, |
| 16 | + rather than with Unstructured. If you are getting unexpected or no results, be sure to check your custom |
| 17 | + script's execution logs first for any informational and error messages. |
| 18 | +</Note> |
| 19 | + |
| 20 | +## Requirements |
| 21 | + |
| 22 | +import GetStartedSimpleApiOnly from '/snippets/general-shared-text/get-started-simple-api-only.mdx' |
| 23 | + |
| 24 | +To use this example, you will need the following: |
| 25 | + |
| 26 | +- An Unstructured account, and an Unstructured API key for your account, as follows: |
| 27 | + |
| 28 | + <GetStartedSimpleApiOnly /> |
| 29 | + |
| 30 | +- The Unstructured Workflow Endpoint URL for your account, as follows: |
| 31 | + |
| 32 | + 1. In the Unstructured UI, click **API Keys** on the sidebar.<br/> |
| 33 | + 2. Note the value of the **Unstructured Workflow Endpoint** field. |
| 34 | + |
| 35 | +- A Google Cloud Storage source connector in your Unstructured account. [Learn how](/ui/sources/google-cloud). |
| 36 | +- Some available [destination connector](/ui/destinations/overview) in your Unstructured account. |
| 37 | +- A workflow that uses the preceding source and destination connectors. [Learn how](/ui/workflows). |
| 38 | +- An OAuth 2.0 client ID and client secret to call the Google API, as follows: |
| 39 | + |
| 40 | + 1. Sign in to your [Google Cloud account](https://cloud.google.com). |
| 41 | + 2. Go to the [Google Cloud APIs dashboard](https://console.cloud.google.com/apis/dashboard). |
| 42 | + 3. Click **+ Enable APIs and services**. |
| 43 | + 4. In the **Search for APIs & Services** box, enter `Cloud Storage API`. |
| 44 | + 5. In the list of search results, click **Cloud Storage API**. |
| 45 | + 6. Make sure that **API Enabled** is shown. If not, click **Enable**. |
| 46 | + 7. Go to your [Google Cloud console welcome page](https://console.cloud.google.com/welcome). |
| 47 | + 8. In the **Search (/) for resources, docs, products, and more** box, enter `Credentials`. |
| 48 | + 9. Click **Credentials (APIs & Services)**. |
| 49 | + 10. Click **+ Create credentials > OAuth client ID**. |
| 50 | + 11. For **Application type**, select **Web application**. |
| 51 | + 12. (Optional) Enter some non-default **Name** for this OAuth 2.0 client to be shown in the list of created clients in your Google Cloud Console. |
| 52 | + 13. Click **Create**. |
| 53 | + 14. After the OAuth client is created, click **Download JSON** to save the client ID and client secret values to a JSON file on your local |
| 54 | + machine. Store this JSON file in a secure location. |
| 55 | + |
| 56 | +## Step 1: Create the Google Apps Script project |
| 57 | + |
| 58 | +1. Go to [http://script.google.com/](http://script.google.com/). |
| 59 | +2. Click **+ New project**. |
| 60 | +3. Click the new project's default name (such as **Untitled project**), and change it to something more descriptive, such as **Unstructured Scripts for GCS**. |
| 61 | + |
| 62 | +## Step 2: Add the script |
| 63 | + |
| 64 | +1. With the project still open, on the sidebar, click the **< >** (**Editor**) icon. |
| 65 | +2. In the **Files** tab, click **Code.gs**. |
| 66 | +3. Replace the contents of the `Code.gs` file with the following code instead: |
| 67 | + |
| 68 | + ```javascript |
| 69 | + // Configure the OAuth2 service. |
| 70 | + function getOAuthService() { |
| 71 | + return OAuth2.createService('GCS') |
| 72 | + .setAuthorizationBaseUrl('https://accounts.google.com/o/oauth2/auth') |
| 73 | + .setTokenUrl('https://oauth2.googleapis.com/token') |
| 74 | + .setClientId(CLIENT_ID) |
| 75 | + .setClientSecret(CLIENT_SECRET) |
| 76 | + .setCallbackFunction('authCallback') |
| 77 | + .setPropertyStore(PropertiesService.getUserProperties()) |
| 78 | + .setScope(OAUTH_SCOPE); |
| 79 | + } |
| 80 | + |
| 81 | + // OAuth2 callback function. |
| 82 | + function authCallback(request) { |
| 83 | + const service = getOAuthService(); |
| 84 | + const isAuthorized = service.handleCallback(request); |
| 85 | + return HtmlService.createHtmlOutput(isAuthorized ? 'Success!' : 'Denied'); |
| 86 | + } |
| 87 | + |
| 88 | + // Get a valid access token (refreshes automatically if expired). |
| 89 | + function getAccessToken() { |
| 90 | + const service = getOAuthService(); |
| 91 | + if (!service.hasAccess()) { |
| 92 | + const authorizationUrl = service.getAuthorizationUrl(); |
| 93 | + Logger.log('Open the following URL and re-run the script: %s', authorizationUrl); |
| 94 | + throw new Error('Authorization required. Open the URL in the log.'); |
| 95 | + } |
| 96 | + return service.getAccessToken(); |
| 97 | + } |
| 98 | + |
| 99 | + // Main function: checks for new or updated files in the bucket. |
| 100 | + function checkForNewOrUpdatedGCSFiles() { |
| 101 | + const thresholdMillis = 5 * 60 * 1000; // 5 minutes. |
| 102 | + const now = new Date(); |
| 103 | + |
| 104 | + // Get (and refresh if needed) the access token. |
| 105 | + const accessToken = getAccessToken(); |
| 106 | + |
| 107 | + // List objects in the bucket. |
| 108 | + const apiUrl = `https://storage.googleapis.com/storage/v1/b/${BUCKET_PATH}/o`; |
| 109 | + const response = UrlFetchApp.fetch(apiUrl, { |
| 110 | + method: 'get', |
| 111 | + headers: { |
| 112 | + 'Authorization': 'Bearer ' + accessToken, |
| 113 | + 'Accept': 'application/json' |
| 114 | + } |
| 115 | + }); |
| 116 | + const data = JSON.parse(response.getContentText()); |
| 117 | + const files = data.items || []; |
| 118 | + |
| 119 | + for (let i = 0; i < files.length; i++) { |
| 120 | + const file = files[i]; |
| 121 | + const fileName = file.name; |
| 122 | + const created = new Date(file.timeCreated); |
| 123 | + const updated = new Date(file.updated); |
| 124 | + |
| 125 | + const millisSinceCreated = now - created; |
| 126 | + const createdWithinThreshold = millisSinceCreated < thresholdMillis; |
| 127 | + const millisSinceUpdated = now - updated; |
| 128 | + const updatedWithinThreshold = millisSinceUpdated < thresholdMillis; |
| 129 | + |
| 130 | + console.log('File Name: ' + fileName); |
| 131 | + console.log('Created: ' + created); |
| 132 | + console.log('Last updated: ' + updated); |
| 133 | + console.log('Milliseconds since created: ' + millisSinceCreated); |
| 134 | + console.log('Milliseconds since last updated: ' + millisSinceUpdated); |
| 135 | + console.log('Created within threshold of ' + thresholdMillis + ' milliseconds? ' + createdWithinThreshold); |
| 136 | + console.log('Updated within threshold of ' + thresholdMillis + ' milliseconds? ' + updatedWithinThreshold); |
| 137 | + console.log('-----'); |
| 138 | + |
| 139 | + if (createdWithinThreshold || updatedWithinThreshold) { |
| 140 | + // Trigger your workflow. |
| 141 | + UrlFetchApp.fetch(UNSTRUCTURED_API_URL, { |
| 142 | + method: 'post', |
| 143 | + headers: { |
| 144 | + 'accept': 'application/json', |
| 145 | + 'unstructured-api-key': UNSTRUCTURED_API_KEY |
| 146 | + } |
| 147 | + }); |
| 148 | + console.log('At least one file created or updated within threshold of ' + thresholdMillis + ' milliseconds.'); |
| 149 | + console.log('Unstructured workflow request sent to ' + UNSTRUCTURED_API_URL); |
| 150 | + return; |
| 151 | + } |
| 152 | + } |
| 153 | + console.log('No files created or updated within threshold of ' + thresholdMillis + ' milliseconds. No Unstructured workflow request sent.'); |
| 154 | + } |
| 155 | + |
| 156 | + ``` |
| 157 | + |
| 158 | +4. Click the **Save project to Drive** button. |
| 159 | + |
| 160 | +## Step 3: Customize the script for your workflow |
| 161 | + |
| 162 | +1. With the project still open, on the **Files** tab, click the **Add a file** button, and then click **Script**. |
| 163 | +2. Name the new file `Constants`. The `.gs` extension is added automatically. |
| 164 | +3. Replace the contents of the `Constants.gs` file with the following code instead: |
| 165 | + |
| 166 | + ```javascript |
| 167 | + const BUCKET_PATH = '<bucket-path>'; |
| 168 | + const UNSTRUCTURED_API_URL = '<unstructured-api-url>' + '/workflows/<workflow-id>/run'; |
| 169 | + const UNSTRUCTURED_API_KEY = '<unstructured-api-key>'; |
| 170 | + const CLIENT_ID = '<client-id>'; |
| 171 | + const CLIENT_SECRET = '<client-secret>'; |
| 172 | + const OAUTH_SCOPE = 'https://www.googleapis.com/auth/devstorage.read_only'; // Or .read_write or .full_control |
| 173 | + ``` |
| 174 | + |
| 175 | + Replace the following placeholders: |
| 176 | + |
| 177 | + - Replace `<bucket-path>` with the path to your Google Cloud Storage bucket. This is the same path that you specified |
| 178 | + when you created your Google Cloud Storage source connector in your Unstructured account. Do not include the `gs://` prefix here. |
| 179 | + - Replace `<unstructured-api-url>` with your Unstructured API URL value. |
| 180 | + - Replace `<workflow-id>` with the ID of your Unstructured workflow. |
| 181 | + - Replace `<unstructured-api-key>` with your Unstructured API key value. |
| 182 | + - Replace `<client-id>` with your OAuth 2.0 client ID value. |
| 183 | + - Replace `<client-secret>` with your OAuth 2.0 client secret value. |
| 184 | + |
| 185 | +4. Click the disk (**Save project to Drive**) icon. |
| 186 | + |
| 187 | +## Step 4: Generate an initial OAuth 2.0 access token |
| 188 | + |
| 189 | +1. On the sidebar, click the gear (**Project Settings**) icon. |
| 190 | +2. In the **IDs** area, next to **Script ID**, click **Copy** to copy the script's ID value to your system's clipboard. |
| 191 | +3. In a separate tab in your web browser, open your [Google Cloud Console welcome page](https://console.cloud.google.com/welcome). |
| 192 | +4. In the **Search (/) for resources, docs, products, and more** box, enter `Credentials`. |
| 193 | +5. Click **Credentials (APIs & Services)**. |
| 194 | +6. In the **OAuth 2.0 client IDs** list, click the link for the client ID that you created earlier in the requirements. |
| 195 | +7. Under **Authorized redirect URIs**, click **Add URI**. |
| 196 | +8. In the **URIs 1** box, enter `https://script.google.com/macros/d/<script-id>/usercallback`, replacing `<script-id>` with the script's ID value that you copied earlier. |
| 197 | +9. Click **Save**. |
| 198 | +10. On the original tab in your web browser, with the Google Apps Script project still open to the **Constants.gs** file, on the sidebar, next to **Libraries**, click the **+** (**Add a library**) icon. |
| 199 | +11. For **Script ID**, enter `1B7FSrk5Zi6L1rSxxTDgDEUsPzlukDsi4KGuTMorsTQHhGBzBkMun4iDF`, and then click **Look up**. |
| 200 | +12. For **Version**, make sure the largest number is selected. |
| 201 | +13. Click **Add**. |
| 202 | +14. In the sidebar, click the **Code.gs** file to open it. |
| 203 | +15. In the file's top navigation bar, select **getAccessToken**. |
| 204 | +16. Click the **Run** icon. |
| 205 | +17. In the **Execution log** area, next to the message `Open the following URL and re-run the script`, copy the entire URL into |
| 206 | + a separate tab in your web browser and then browse to that URL. |
| 207 | +18. When prompted, click **Review permissions**, and follow the on-screen instructions to grant the necessary permissions. |
| 208 | + |
| 209 | +## Step 5: Create the script trigger |
| 210 | + |
| 211 | +1. On the original tab in your web browser, with the Google Apps Script project still open, on the sidebar, click the alarm clock (**Triggers**) icon. |
| 212 | +2. Click the **+ Add Trigger** button. |
| 213 | +3. Set the following values: |
| 214 | + |
| 215 | + - For **Choose which function to run**, select `checkForNewOrUpdatedGCSFiles`. |
| 216 | + - For **Choose which deployment should run**, select **Head**. |
| 217 | + - For **Select event source**, select **Time-driven**. |
| 218 | + - For **Select type of time based trigger**, select **Minutes timer**. |
| 219 | + - For **Select minute interval**, select **Every 5 minutes**. |
| 220 | + |
| 221 | + <Note> |
| 222 | + If you change **Minutes timer** or **Every 5 minutes** to a different interval, you should also go back and change the number `5` in the following |
| 223 | + line of code in the `checkForNewOrUpdatedFiles` function. Change the number `5` to the number of minutes that correspond to the alternate interval you |
| 224 | + selected: |
| 225 | + |
| 226 | + ```javascript |
| 227 | + const thresholdMillis = 5 * 60 * 1000; |
| 228 | + ``` |
| 229 | + </Note> |
| 230 | + |
| 231 | + - For **Failure notification settings**, select an interval such as immediately, hourly, or daily. |
| 232 | + |
| 233 | +4. Click **Save**. |
| 234 | + |
| 235 | +## Step 6: View trigger results |
| 236 | + |
| 237 | +1. With the Google Apps Script project still open, on the sidebar, click the three lines (**Executions**) icon. |
| 238 | +2. As soon as the first script execution completes, you should see a corresponding message appear in the **Executions** list. If the **Status** column shows |
| 239 | + **Completed**, then keep going with this procedure. |
| 240 | + |
| 241 | + If the **Status** column shows **Failed**, expand the message to |
| 242 | + get any details about the failure. Fix the failure, and then wait for the next script execution to complete. |
| 243 | + |
| 244 | +3. When the **Status** column shows **Completed** then, in your Unstructured account's user interface, click **Jobs** on the sidebar to see if a new job |
| 245 | + is running for that worklow. |
| 246 | + |
| 247 | + If no new job is running for that workflow, then add at least one new file to—or update at least one existing file within—the Google Cloud Storage bucket, |
| 248 | + within 5 minutes of the next script execution. After the next script execution, check the **Jobs** list again. |
| 249 | + |
| 250 | +## Step 7 (Optional): Delete the trigger |
| 251 | + |
| 252 | +1. To stop the script from automatically executing on a regular basis, with the Google Apps Script project still open, on the sidebar, click the alarm clock (**Triggers**) icon. |
| 253 | +2. Rest your mouse pointer on the trigger you created in Step 5. |
| 254 | +3. Click the ellipsis (three dots) icon, and then click **Delete trigger**. |
| 255 | + |
| 256 | + |
0 commit comments