Skip to content

Commit f913684

Browse files
authored
Google Cloud Storage source connector: workflow run trigger (#682)
1 parent 6c5d352 commit f913684

File tree

2 files changed

+257
-0
lines changed

2 files changed

+257
-0
lines changed

docs.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -273,6 +273,7 @@
273273
"group": "Tool demos",
274274
"pages": [
275275
"examplecode/tools/google-drive-events",
276+
"examplecode/tools/gcs-events",
276277
"examplecode/tools/jq",
277278
"examplecode/tools/firecrawl",
278279
"examplecode/tools/langflow",

examplecode/tools/gcs-events.mdx

Lines changed: 256 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,256 @@
1+
---
2+
title: Google Cloud Storage event triggers
3+
---
4+
5+
You can use Google Cloud Storage events, such as adding new files to—or updating existing files within—Google Cloud Storage buckets, to automatically run Unstructured ETL+ workflows
6+
that rely on those buckets as sources. This enables a no-touch approach to having Unstructured automatically process new and updated files in Google Cloud Storage as they are added or updated.
7+
8+
This example shows how to automate this process by adding a custom [Google Apps Script](https://developers.google.com/apps-script) project in your Google account. This project runs
9+
a script on a regular time interval. This script automatically checks for new or updated files within the specified Google Cloud Storage bucket. If the script
10+
detects at least one new or updated file, it then calls the [Unstructured Workflow Endpoint](/api-reference/workflow/overview) to automatically run the
11+
specified corresponding Unstructured ETL+ workflow in your Unstructured account.
12+
13+
<Note>
14+
This example uses a custom Google Apps Script that you create and maintain.
15+
Any issues with file detection, timing, or script execution could be related to your custom script,
16+
rather than with Unstructured. If you are getting unexpected or no results, be sure to check your custom
17+
script's execution logs first for any informational and error messages.
18+
</Note>
19+
20+
## Requirements
21+
22+
import GetStartedSimpleApiOnly from '/snippets/general-shared-text/get-started-simple-api-only.mdx'
23+
24+
To use this example, you will need the following:
25+
26+
- An Unstructured account, and an Unstructured API key for your account, as follows:
27+
28+
<GetStartedSimpleApiOnly />
29+
30+
- The Unstructured Workflow Endpoint URL for your account, as follows:
31+
32+
1. In the Unstructured UI, click **API Keys** on the sidebar.<br/>
33+
2. Note the value of the **Unstructured Workflow Endpoint** field.
34+
35+
- A Google Cloud Storage source connector in your Unstructured account. [Learn how](/ui/sources/google-cloud).
36+
- Some available [destination connector](/ui/destinations/overview) in your Unstructured account.
37+
- A workflow that uses the preceding source and destination connectors. [Learn how](/ui/workflows).
38+
- An OAuth 2.0 client ID and client secret to call the Google API, as follows:
39+
40+
1. Sign in to your [Google Cloud account](https://cloud.google.com).
41+
2. Go to the [Google Cloud APIs dashboard](https://console.cloud.google.com/apis/dashboard).
42+
3. Click **+ Enable APIs and services**.
43+
4. In the **Search for APIs & Services** box, enter `Cloud Storage API`.
44+
5. In the list of search results, click **Cloud Storage API**.
45+
6. Make sure that **API Enabled** is shown. If not, click **Enable**.
46+
7. Go to your [Google Cloud console welcome page](https://console.cloud.google.com/welcome).
47+
8. In the **Search (/) for resources, docs, products, and more** box, enter `Credentials`.
48+
9. Click **Credentials (APIs & Services)**.
49+
10. Click **+ Create credentials > OAuth client ID**.
50+
11. For **Application type**, select **Web application**.
51+
12. (Optional) Enter some non-default **Name** for this OAuth 2.0 client to be shown in the list of created clients in your Google Cloud Console.
52+
13. Click **Create**.
53+
14. After the OAuth client is created, click **Download JSON** to save the client ID and client secret values to a JSON file on your local
54+
machine. Store this JSON file in a secure location.
55+
56+
## Step 1: Create the Google Apps Script project
57+
58+
1. Go to [http://script.google.com/](http://script.google.com/).
59+
2. Click **+ New project**.
60+
3. Click the new project's default name (such as **Untitled project**), and change it to something more descriptive, such as **Unstructured Scripts for GCS**.
61+
62+
## Step 2: Add the script
63+
64+
1. With the project still open, on the sidebar, click the **< >** (**Editor**) icon.
65+
2. In the **Files** tab, click **Code.gs**.
66+
3. Replace the contents of the `Code.gs` file with the following code instead:
67+
68+
```javascript
69+
// Configure the OAuth2 service.
70+
function getOAuthService() {
71+
return OAuth2.createService('GCS')
72+
.setAuthorizationBaseUrl('https://accounts.google.com/o/oauth2/auth')
73+
.setTokenUrl('https://oauth2.googleapis.com/token')
74+
.setClientId(CLIENT_ID)
75+
.setClientSecret(CLIENT_SECRET)
76+
.setCallbackFunction('authCallback')
77+
.setPropertyStore(PropertiesService.getUserProperties())
78+
.setScope(OAUTH_SCOPE);
79+
}
80+
81+
// OAuth2 callback function.
82+
function authCallback(request) {
83+
const service = getOAuthService();
84+
const isAuthorized = service.handleCallback(request);
85+
return HtmlService.createHtmlOutput(isAuthorized ? 'Success!' : 'Denied');
86+
}
87+
88+
// Get a valid access token (refreshes automatically if expired).
89+
function getAccessToken() {
90+
const service = getOAuthService();
91+
if (!service.hasAccess()) {
92+
const authorizationUrl = service.getAuthorizationUrl();
93+
Logger.log('Open the following URL and re-run the script: %s', authorizationUrl);
94+
throw new Error('Authorization required. Open the URL in the log.');
95+
}
96+
return service.getAccessToken();
97+
}
98+
99+
// Main function: checks for new or updated files in the bucket.
100+
function checkForNewOrUpdatedGCSFiles() {
101+
const thresholdMillis = 5 * 60 * 1000; // 5 minutes.
102+
const now = new Date();
103+
104+
// Get (and refresh if needed) the access token.
105+
const accessToken = getAccessToken();
106+
107+
// List objects in the bucket.
108+
const apiUrl = `https://storage.googleapis.com/storage/v1/b/${BUCKET_PATH}/o`;
109+
const response = UrlFetchApp.fetch(apiUrl, {
110+
method: 'get',
111+
headers: {
112+
'Authorization': 'Bearer ' + accessToken,
113+
'Accept': 'application/json'
114+
}
115+
});
116+
const data = JSON.parse(response.getContentText());
117+
const files = data.items || [];
118+
119+
for (let i = 0; i < files.length; i++) {
120+
const file = files[i];
121+
const fileName = file.name;
122+
const created = new Date(file.timeCreated);
123+
const updated = new Date(file.updated);
124+
125+
const millisSinceCreated = now - created;
126+
const createdWithinThreshold = millisSinceCreated < thresholdMillis;
127+
const millisSinceUpdated = now - updated;
128+
const updatedWithinThreshold = millisSinceUpdated < thresholdMillis;
129+
130+
console.log('File Name: ' + fileName);
131+
console.log('Created: ' + created);
132+
console.log('Last updated: ' + updated);
133+
console.log('Milliseconds since created: ' + millisSinceCreated);
134+
console.log('Milliseconds since last updated: ' + millisSinceUpdated);
135+
console.log('Created within threshold of ' + thresholdMillis + ' milliseconds? ' + createdWithinThreshold);
136+
console.log('Updated within threshold of ' + thresholdMillis + ' milliseconds? ' + updatedWithinThreshold);
137+
console.log('-----');
138+
139+
if (createdWithinThreshold || updatedWithinThreshold) {
140+
// Trigger your workflow.
141+
UrlFetchApp.fetch(UNSTRUCTURED_API_URL, {
142+
method: 'post',
143+
headers: {
144+
'accept': 'application/json',
145+
'unstructured-api-key': UNSTRUCTURED_API_KEY
146+
}
147+
});
148+
console.log('At least one file created or updated within threshold of ' + thresholdMillis + ' milliseconds.');
149+
console.log('Unstructured workflow request sent to ' + UNSTRUCTURED_API_URL);
150+
return;
151+
}
152+
}
153+
console.log('No files created or updated within threshold of ' + thresholdMillis + ' milliseconds. No Unstructured workflow request sent.');
154+
}
155+
156+
```
157+
158+
4. Click the **Save project to Drive** button.
159+
160+
## Step 3: Customize the script for your workflow
161+
162+
1. With the project still open, on the **Files** tab, click the **Add a file** button, and then click **Script**.
163+
2. Name the new file `Constants`. The `.gs` extension is added automatically.
164+
3. Replace the contents of the `Constants.gs` file with the following code instead:
165+
166+
```javascript
167+
const BUCKET_PATH = '<bucket-path>';
168+
const UNSTRUCTURED_API_URL = '<unstructured-api-url>' + '/workflows/<workflow-id>/run';
169+
const UNSTRUCTURED_API_KEY = '<unstructured-api-key>';
170+
const CLIENT_ID = '<client-id>';
171+
const CLIENT_SECRET = '<client-secret>';
172+
const OAUTH_SCOPE = 'https://www.googleapis.com/auth/devstorage.read_only'; // Or .read_write or .full_control
173+
```
174+
175+
Replace the following placeholders:
176+
177+
- Replace `<bucket-path>` with the path to your Google Cloud Storage bucket. This is the same path that you specified
178+
when you created your Google Cloud Storage source connector in your Unstructured account. Do not include the `gs://` prefix here.
179+
- Replace `<unstructured-api-url>` with your Unstructured API URL value.
180+
- Replace `<workflow-id>` with the ID of your Unstructured workflow.
181+
- Replace `<unstructured-api-key>` with your Unstructured API key value.
182+
- Replace `<client-id>` with your OAuth 2.0 client ID value.
183+
- Replace `<client-secret>` with your OAuth 2.0 client secret value.
184+
185+
4. Click the disk (**Save project to Drive**) icon.
186+
187+
## Step 4: Generate an initial OAuth 2.0 access token
188+
189+
1. On the sidebar, click the gear (**Project Settings**) icon.
190+
2. In the **IDs** area, next to **Script ID**, click **Copy** to copy the script's ID value to your system's clipboard.
191+
3. In a separate tab in your web browser, open your [Google Cloud Console welcome page](https://console.cloud.google.com/welcome).
192+
4. In the **Search (/) for resources, docs, products, and more** box, enter `Credentials`.
193+
5. Click **Credentials (APIs & Services)**.
194+
6. In the **OAuth 2.0 client IDs** list, click the link for the client ID that you created earlier in the requirements.
195+
7. Under **Authorized redirect URIs**, click **Add URI**.
196+
8. In the **URIs 1** box, enter `https://script.google.com/macros/d/<script-id>/usercallback`, replacing `<script-id>` with the script's ID value that you copied earlier.
197+
9. Click **Save**.
198+
10. On the original tab in your web browser, with the Google Apps Script project still open to the **Constants.gs** file, on the sidebar, next to **Libraries**, click the **+** (**Add a library**) icon.
199+
11. For **Script ID**, enter `1B7FSrk5Zi6L1rSxxTDgDEUsPzlukDsi4KGuTMorsTQHhGBzBkMun4iDF`, and then click **Look up**.
200+
12. For **Version**, make sure the largest number is selected.
201+
13. Click **Add**.
202+
14. In the sidebar, click the **Code.gs** file to open it.
203+
15. In the file's top navigation bar, select **getAccessToken**.
204+
16. Click the **Run** icon.
205+
17. In the **Execution log** area, next to the message `Open the following URL and re-run the script`, copy the entire URL into
206+
a separate tab in your web browser and then browse to that URL.
207+
18. When prompted, click **Review permissions**, and follow the on-screen instructions to grant the necessary permissions.
208+
209+
## Step 5: Create the script trigger
210+
211+
1. On the original tab in your web browser, with the Google Apps Script project still open, on the sidebar, click the alarm clock (**Triggers**) icon.
212+
2. Click the **+ Add Trigger** button.
213+
3. Set the following values:
214+
215+
- For **Choose which function to run**, select `checkForNewOrUpdatedGCSFiles`.
216+
- For **Choose which deployment should run**, select **Head**.
217+
- For **Select event source**, select **Time-driven**.
218+
- For **Select type of time based trigger**, select **Minutes timer**.
219+
- For **Select minute interval**, select **Every 5 minutes**.
220+
221+
<Note>
222+
If you change **Minutes timer** or **Every 5 minutes** to a different interval, you should also go back and change the number `5` in the following
223+
line of code in the `checkForNewOrUpdatedFiles` function. Change the number `5` to the number of minutes that correspond to the alternate interval you
224+
selected:
225+
226+
```javascript
227+
const thresholdMillis = 5 * 60 * 1000;
228+
```
229+
</Note>
230+
231+
- For **Failure notification settings**, select an interval such as immediately, hourly, or daily.
232+
233+
4. Click **Save**.
234+
235+
## Step 6: View trigger results
236+
237+
1. With the Google Apps Script project still open, on the sidebar, click the three lines (**Executions**) icon.
238+
2. As soon as the first script execution completes, you should see a corresponding message appear in the **Executions** list. If the **Status** column shows
239+
**Completed**, then keep going with this procedure.
240+
241+
If the **Status** column shows **Failed**, expand the message to
242+
get any details about the failure. Fix the failure, and then wait for the next script execution to complete.
243+
244+
3. When the **Status** column shows **Completed** then, in your Unstructured account's user interface, click **Jobs** on the sidebar to see if a new job
245+
is running for that worklow.
246+
247+
If no new job is running for that workflow, then add at least one new file to&mdash;or update at least one existing file within&mdash;the Google Cloud Storage bucket,
248+
within 5 minutes of the next script execution. After the next script execution, check the **Jobs** list again.
249+
250+
## Step 7 (Optional): Delete the trigger
251+
252+
1. To stop the script from automatically executing on a regular basis, with the Google Apps Script project still open, on the sidebar, click the alarm clock (**Triggers**) icon.
253+
2. Rest your mouse pointer on the trigger you created in Step 5.
254+
3. Click the ellipsis (three dots) icon, and then click **Delete trigger**.
255+
256+

0 commit comments

Comments
 (0)