You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In this tutorial, you will learn how to ingest clickstream data to a R2 bucket using Pipelines. You will send this data from the client-side, that means you will make a call to the Pipelines URL directly from the client-side JavaScript code.
18
+
In this tutorial, you will learn how to build a data lake of website interaction events (clickstream data), using Pipelines.
19
19
20
-
For this tutorial, you will build a landing page of an e-commerce website. The page will list the products available for sale. A user can click on the view button to view the product details or click on the add to cart button to add the product to their cart. The focus of this tutorial is to show how to ingest the data to R2 using Pipelines from the client-side. Hence, the landing page will be a simple HTML page with no actual e-commerce functionality.
20
+
Data lakes are a way to store large volumes of raw data in an object storage service such as [R2](/r2). You can run queries over a data lake, to analyze the raw events and generate product insights.
21
+
22
+
For this tutorial, you will build a landing page for an e-commerce website. Users can click on the website, to view products or add them to the cart. As the user clicks on the page, events will be sent to a pipeline. These events are "client-side"; they're sent directly from a users' browser to your pipeline. Your pipeline will automatically batch the ingested data, build output files, and deliver them to an [R2 bucket](/r2) to build your data lake.
21
23
22
24
## Prerequisites
23
25
24
-
1. Create a [R2 bucket](/r2/buckets/create-buckets/) in your Cloudflare account.
Use a Node version manager like [Volta](https://volta.sh/) or
@@ -31,7 +32,7 @@ For this tutorial, you will build a landing page of an e-commerce website. The p
31
32
later in this guide, requires a Node version of `16.17.0` or later.
32
33
</Details>
33
34
34
-
## 1. Create a new project
35
+
## 1. Create a new Workers project
35
36
36
37
You will create a new Worker project that will use [Static Assets](/workers/static-assets/) to serve the HTML file. While you can use any front-end framework, this tutorial uses plain HTML and JavaScript to keep things simple. If you are interested in learning how to build and deploy a web application on Workers with Static Assets, you can refer to the [Frameworks](/workers/frameworks/) documentation.
37
38
@@ -59,9 +60,9 @@ Navigate to the `e-commerce-pipelines-client-side` directory:
59
60
cd e-commerce-pipelines-client-side
60
61
```
61
62
62
-
## 2. Create the front-end
63
+
## 2. Create the website frontend
63
64
64
-
Using Static Assets, you can serve the frontend of your application from your Worker. To use Static Assets, you need to add the required bindings to your `wrangler.toml` file.
65
+
Using [Workers Static Assets](/workers/static-assets/), you can serve the frontend of your application from your Worker. To use Static Assets, you need to add the required bindings to your `wrangler.toml` file.
65
66
66
67
<WranglerConfig>
67
68
@@ -185,7 +186,6 @@ Next, create a `public` directory and add an `index.html` file. The `index.html`
185
186
</body>
186
187
187
188
</html>
188
-
189
189
```
190
190
</details>
191
191
@@ -197,18 +197,22 @@ The above code does the following:
197
197
- Adds a button to add a product to the cart.
198
198
- Contains a `handleClick` function to handle the click events. This function logs the action and the product ID. In the next steps, you will create a pipeline and add the logic to send the click events to this pipeline.
199
199
200
-
## 3. Create a pipeline
200
+
## 3. Create an R2 Bucket
201
+
We'll create a new R2 bucket to use as the sink for our pipeline. Create a new r2 bucket `clickstream-bucket` using the [Wrangler CLI](/workers/wrangler/). Open a terminal window, and run the following command:
202
+
203
+
```sh
204
+
npx wrangler r2 bucket create clickstream-bucket
205
+
```
201
206
207
+
## 4. Create a pipeline
202
208
You need to create a new pipeline and connect it to your R2 bucket.
203
209
204
-
Create a new pipeline `clickstream-pipeline-client` using the [Wrangler CLI](/workers/wrangler/):
210
+
Create a new pipeline `clickstream-pipeline-client` using the [Wrangler CLI](/workers/wrangler/). Open a terminal window, and run the following command:
Replace `<BUCKET_NAME>` with the name of your R2 bucket.
211
-
212
216
When you run the command, you will be prompted to authorize Cloudflare Workers Pipelines to create R2 API tokens on your behalf. These tokens are required by your Pipeline. Your Pipeline uses these tokens when loading data into your bucket. You can approve the request through the browser link which will open automatically.
213
217
214
218
:::note
@@ -220,13 +224,13 @@ These flags are useful for testing, but we recommend keeping the default setting
220
224
:::
221
225
222
226
```output
223
-
✅ Successfully created Pipeline "clickstream-pipeline-client" with ID 0a10c577652949718bc014f4efxea241
227
+
✅ Successfully created Pipeline "clickstream-pipeline-client" with ID <PIPELINE_ID>
Make a note of the URL of the pipeline. You will use this URL to send the clickstream data from the client-side.
252
256
253
-
## 4. Generate clickstream data
257
+
## 5. Generate clickstream data
254
258
255
259
You need to send clickstream data like the `timestamp`, `user_id`, `session_id`, and `device_info` to your pipeline. You can generate this data on the client side. Add the following function in the `<script>` tag in your `public/index.html`. This function gets the device information:
256
260
@@ -306,7 +310,7 @@ function extractDeviceInfo(userAgent) {
306
310
}
307
311
```
308
312
309
-
## 5. Send clickstream data to your pipeline
313
+
## 6. Send clickstream data to your pipeline
310
314
311
315
You will send the clickstream data to the pipline from the client-side. To do that, update the `handleClick` function to make a `POST` request to the pipeline URL with the data. Replace `<PIPELINE_URL>` with the URL of your pipeline.
312
316
@@ -368,19 +372,17 @@ npm run dev
368
372
369
373
However, no data gets sent to the pipeline. Inspect the browser console to view the error message. The error message you see is for [CORS](https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS). In the next step, you will update the CORS settings to allow the client-side JavaScript to send data to the pipeline.
370
374
371
-
## 6. Update CORS settings
375
+
## 7. Update CORS settings
372
376
373
-
By default, the Pipelines endpoint does not allow cross-origin requests. You need to update the CORS settings to allow the client-side JavaScript to send data to the pipeline. To update the CORS settings, execute the following command:
377
+
By default, the HTTP ingestion endpoint for your pipeline does not allow cross-origin requests. You need to update the CORS settings to allow the client-side JavaScript to send data to the pipeline. To update the CORS settings, execute the following command:
Now when you run the development server and open the application in the browser, you will see the clickstream data being sent to the pipeline when you click on the `View Details` or `Add to Cart` button. You can also see the data in the R2 bucket.
383
+
Now when you run the development server locally, and open the website in a browser, clickstream data will be successfully sent to the pipeline. You can learn more about the CORS settings in the [Specifying CORS settings](/pipelines/build-with-pipelines/http/#specifying-cors-settings) documentation.
380
384
381
-
You can learn more about the CORS settings in the [Specifying CORS settings](/pipelines/build-with-pipelines/http/#specifying-cors-settings) documentation.
382
-
383
-
## 7. Deploy the application
385
+
## 8. Deploy the application
384
386
385
387
To deploy the application, run the following command:
We now need to update the pipeline's CORS settings to include the URL of our newly deployed application. Run the command below, and replace `<URL>` with the URL of the application.
411
+
We now need to update the pipeline's CORS settings again. This time, we'll include the URL of our newly deployed application. Run the command below, and replace `<URL>` with the URL of the application.
Now, you can access the application at the deployed URL. When you click on the `View Details` or `Add to Cart` button, the clickstream data will be sent to your pipeline.
416
418
417
-
## 8. View the data in R2
419
+
## 9. View the data in R2
418
420
419
-
You can view the data in the R2 bucket. If you are not signed in to the Cloudflare dashboard, sign in and navigate to the R2 overview page.
421
+
You can view the data in the R2 bucket. If you are not signed in to the Cloudflare dashboard, sign in and navigate to the [R2 overview](https://dash.cloudflare.com/?to=/:account/r2/overview) page.
420
422
421
423
Open the bucket you configured for your pipeline in Step 3. You can see files, representing the clickstream data. These files are newline delimited JSON files. Each row in a file represents one click event. Download one of the files, and open it in your preferred text editor to see the output:
422
424
@@ -428,15 +430,50 @@ Open the bucket you configured for your pipeline in Step 3. You can see files, r
428
430
{"timestamp":"2025-04-06T16:24:33.978Z","session_id":"1234567890abcdef","user_id":"user333","event_data":{"event_id":467,"event_type":"product_view","page_url":"https://<URL>.workers.dev/","timestamp":"2025-04-06T16:24:33.978Z","product_id":6},"device_info":{"browser":"Chrome","os":"Linux","device":"Mobile","userAgent":"Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Mobile Safari/537.36"},"referrer":""}
429
431
```
430
432
433
+
## 10. Optional: Connect a query engine to your R2 bucket and query the data
434
+
Once you have collected the raw events in R2, you might want to query the events, to answer questions such as "how many `product_view` events occurred?". You can connect a query engine, such as MotherDuck, to your R2 bucket.
435
+
436
+
You can connect the bucket to MotherDuck in several ways, which you can learn about from the [MotherDuck documentation](https://motherduck.com/docs/integrations/cloud-storage/cloudflare-r2/). In this tutorial, you will connect the bucket to MotherDuck using the MotherDuck dashboard.
437
+
438
+
### Connect your bucket to MotherDuck
439
+
440
+
Before connecting the bucket to MotherDuck, you need to obtain the Access Key ID and Secret Access Key for the R2 bucket. You can find the instructions to obtain the keys in the [R2 API documentation](/r2/api/tokens/).
441
+
442
+
Before connecting the bucket to MotherDuck, you need to obtain the Access Key ID and Secret Access Key for the R2 bucket. You can find the instructions to obtain the keys in the [R2 API documentation](/r2/api/tokens/).
443
+
444
+
1. Log in to the MotherDuck dashboard and select your profile.
445
+
2. Navigate to the **Secrets** page.
446
+
3. Select the **Add Secret** button and enter the following information:
447
+
448
+
-**Secret Name**: `Clickstream pipeline`
449
+
-**Secret Type**: `Cloudflare R2`
450
+
-**Access Key ID**: `ACCESS_KEY_ID` (replace with the Access Key ID)
451
+
-**Secret Access Key**: `SECRET_ACCESS_KEY` (replace with the Secret Access Key)
452
+
453
+
4. Select the **Add Secret** button to save the secret.
454
+
455
+
### Query the data
456
+
In this step, you will query the data stored in the R2 bucket using MotherDuck.
457
+
458
+
1. Navigate back to the MotherDuck dashboard and select the **+** icon to add a new Notebook.
459
+
2. Select the **Add Cell** button to add a new cell to the notebook.
460
+
461
+
3. In the cell, enter the following query and select the **Run** button to execute the query:
462
+
463
+
```sql
464
+
SELECTcount(*) FROM read_json_auto('r2://clickstream-bucket/**/*');
465
+
```
466
+
467
+
The query will return a count of all the events received.
468
+
431
469
## Conclusion
432
470
433
471
You have successfully created a Pipeline and used it to send clickstream data from the client. Through this tutorial, you've gained hands-on experience in:
434
472
435
-
1. Creating a Workers project with a static frontend
473
+
1. Creating a Workers project, using static assets
436
474
2. Generating and capturing clickstream data
437
-
3. Setting up a Cloudflare Pipelines to ingest data into R2
475
+
3. Setting up a pipeline to ingest data into R2
438
476
4. Deploying the application to Workers
439
-
440
-
For your next steps, consider connecting your R2 bucket to MotherDuck to analyse the data. You can follow the instructions in the [Analyzing Clickstream Data with MotherDuck and Cloudflare R2](/pipelines/tutorials/query-data-with-motherduck#7-connect-the-r2-bucket-to-motherduck) tutorial to connect your R2 bucket to MotherDuck and analyse data.
477
+
5. Using MotherDuck to query the data
441
478
442
479
You can find the source code of the application in the [GitHub repository](https://github.com/harshil1712/e-commerce-pipelines-client-side).
0 commit comments