Skip to content

Commit 5a60392

Browse files
committed
Improved data lake tutorial
1 parent 73d7457 commit 5a60392

File tree

1 file changed

+71
-34
lines changed
  • src/content/docs/pipelines/tutorials/send-data-from-client

1 file changed

+71
-34
lines changed

src/content/docs/pipelines/tutorials/send-data-from-client/index.mdx

Lines changed: 71 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
---
2-
updated: 2025-03-03
2+
updated: 2025-04-06
33
difficulty: Intermediate
44
content_type: 📝 Tutorial
55
pcx_content_type: tutorial
6-
title: Sending Clickstream data from client-side to Pipelines
6+
title: Create a data lake of clickstream data
77
products:
88
- R2
99
- Workers
@@ -15,14 +15,15 @@ languages:
1515

1616
import { Render, PackageManagers, Details, WranglerConfig } from "~/components";
1717

18-
In this tutorial, you will learn how to ingest clickstream data to a R2 bucket using Pipelines. You will send this data from the client-side, that means you will make a call to the Pipelines URL directly from the client-side JavaScript code.
18+
In this tutorial, you will learn how to build a data lake of website interaction events (clickstream data), using Pipelines.
1919

20-
For this tutorial, you will build a landing page of an e-commerce website. The page will list the products available for sale. A user can click on the view button to view the product details or click on the add to cart button to add the product to their cart. The focus of this tutorial is to show how to ingest the data to R2 using Pipelines from the client-side. Hence, the landing page will be a simple HTML page with no actual e-commerce functionality.
20+
Data lakes are a way to store large volumes of raw data in an object storage service such as [R2](/r2). You can run queries over a data lake, to analyze the raw events and generate product insights.
21+
22+
For this tutorial, you will build a landing page for an e-commerce website. Users can click on the website, to view products or add them to the cart. As the user clicks on the page, events will be sent to a pipeline. These events are "client-side"; they're sent directly from a users' browser to your pipeline. Your pipeline will automatically batch the ingested data, build output files, and deliver them to an [R2 bucket](/r2) to build your data lake.
2123

2224
## Prerequisites
2325

24-
1. Create a [R2 bucket](/r2/buckets/create-buckets/) in your Cloudflare account.
25-
2. Install [`Node.js`](https://docs.npmjs.com/downloading-and-installing-node-js-and-npm).
26+
1. Install [`Node.js`](https://docs.npmjs.com/downloading-and-installing-node-js-and-npm).
2627

2728
<Details header="Node.js version manager">
2829
Use a Node version manager like [Volta](https://volta.sh/) or
@@ -31,7 +32,7 @@ For this tutorial, you will build a landing page of an e-commerce website. The p
3132
later in this guide, requires a Node version of `16.17.0` or later.
3233
</Details>
3334

34-
## 1. Create a new project
35+
## 1. Create a new Workers project
3536

3637
You will create a new Worker project that will use [Static Assets](/workers/static-assets/) to serve the HTML file. While you can use any front-end framework, this tutorial uses plain HTML and JavaScript to keep things simple. If you are interested in learning how to build and deploy a web application on Workers with Static Assets, you can refer to the [Frameworks](/workers/frameworks/) documentation.
3738

@@ -59,9 +60,9 @@ Navigate to the `e-commerce-pipelines-client-side` directory:
5960
cd e-commerce-pipelines-client-side
6061
```
6162

62-
## 2. Create the front-end
63+
## 2. Create the website frontend
6364

64-
Using Static Assets, you can serve the frontend of your application from your Worker. To use Static Assets, you need to add the required bindings to your `wrangler.toml` file.
65+
Using [Workers Static Assets](/workers/static-assets/), you can serve the frontend of your application from your Worker. To use Static Assets, you need to add the required bindings to your `wrangler.toml` file.
6566

6667
<WranglerConfig>
6768

@@ -185,7 +186,6 @@ Next, create a `public` directory and add an `index.html` file. The `index.html`
185186
</body>
186187

187188
</html>
188-
189189
```
190190
</details>
191191

@@ -197,18 +197,22 @@ The above code does the following:
197197
- Adds a button to add a product to the cart.
198198
- Contains a `handleClick` function to handle the click events. This function logs the action and the product ID. In the next steps, you will create a pipeline and add the logic to send the click events to this pipeline.
199199

200-
## 3. Create a pipeline
200+
## 3. Create an R2 Bucket
201+
We'll create a new R2 bucket to use as the sink for our pipeline. Create a new r2 bucket `clickstream-bucket` using the [Wrangler CLI](/workers/wrangler/). Open a terminal window, and run the following command:
202+
203+
```sh
204+
npx wrangler r2 bucket create clickstream-bucket
205+
```
201206

207+
## 4. Create a pipeline
202208
You need to create a new pipeline and connect it to your R2 bucket.
203209

204-
Create a new pipeline `clickstream-pipeline-client` using the [Wrangler CLI](/workers/wrangler/):
210+
Create a new pipeline `clickstream-pipeline-client` using the [Wrangler CLI](/workers/wrangler/). Open a terminal window, and run the following command:
205211

206212
```sh
207-
npx wrangler pipelines create clickstream-pipeline-client --r2-bucket <BUCKET_NAME> --compression none --batch-max-seconds 5
213+
npx wrangler pipelines create clickstream-pipeline-client --r2-bucket clickstream-bucket --compression none --batch-max-seconds 5
208214
```
209215

210-
Replace `<BUCKET_NAME>` with the name of your R2 bucket.
211-
212216
When you run the command, you will be prompted to authorize Cloudflare Workers Pipelines to create R2 API tokens on your behalf. These tokens are required by your Pipeline. Your Pipeline uses these tokens when loading data into your bucket. You can approve the request through the browser link which will open automatically.
213217

214218
:::note
@@ -220,13 +224,13 @@ These flags are useful for testing, but we recommend keeping the default setting
220224
:::
221225

222226
```output
223-
✅ Successfully created Pipeline "clickstream-pipeline-client" with ID 0a10c577652949718bc014f4efxea241
227+
✅ Successfully created Pipeline "clickstream-pipeline-client" with ID <PIPELINE_ID>
224228
225-
Id: 0a10c577652949718bc014f4efxea241
229+
Id: <PIPELINE_ID>
226230
Name: clickstream-pipeline-client
227231
Sources:
228232
HTTP:
229-
Endpoint: https://0a10c577652949718bc014f4efxea241.pipelines.cloudflare.com
233+
Endpoint: https://<PIPELINE_ID>.pipelines.cloudflare.com
230234
Authentication: off
231235
Format: JSON
232236
Worker:
@@ -245,12 +249,12 @@ Destination:
245249
246250
Send data to your Pipeline's HTTP endpoint:
247251
248-
curl "https://0a10c577652949718bc014f4efxea241.pipelines.cloudflare.com" -d '[{"foo": "bar"}]'
252+
curl "https://<PIPELINE_ID>.pipelines.cloudflare.com" -d '[{"foo": "bar"}]'
249253
```
250254

251255
Make a note of the URL of the pipeline. You will use this URL to send the clickstream data from the client-side.
252256

253-
## 4. Generate clickstream data
257+
## 5. Generate clickstream data
254258

255259
You need to send clickstream data like the `timestamp`, `user_id`, `session_id`, and `device_info` to your pipeline. You can generate this data on the client side. Add the following function in the `<script>` tag in your `public/index.html`. This function gets the device information:
256260

@@ -306,7 +310,7 @@ function extractDeviceInfo(userAgent) {
306310
}
307311
```
308312

309-
## 5. Send clickstream data to your pipeline
313+
## 6. Send clickstream data to your pipeline
310314

311315
You will send the clickstream data to the pipline from the client-side. To do that, update the `handleClick` function to make a `POST` request to the pipeline URL with the data. Replace `<PIPELINE_URL>` with the URL of your pipeline.
312316

@@ -368,19 +372,17 @@ npm run dev
368372

369373
However, no data gets sent to the pipeline. Inspect the browser console to view the error message. The error message you see is for [CORS](https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS). In the next step, you will update the CORS settings to allow the client-side JavaScript to send data to the pipeline.
370374

371-
## 6. Update CORS settings
375+
## 7. Update CORS settings
372376

373-
By default, the Pipelines endpoint does not allow cross-origin requests. You need to update the CORS settings to allow the client-side JavaScript to send data to the pipeline. To update the CORS settings, execute the following command:
377+
By default, the HTTP ingestion endpoint for your pipeline does not allow cross-origin requests. You need to update the CORS settings to allow the client-side JavaScript to send data to the pipeline. To update the CORS settings, execute the following command:
374378

375379
```sh
376380
npx wrangler pipelines update clickstream-pipeline-client --cors-origins http://localhost:8787
377381
```
378382

379-
Now when you run the development server and open the application in the browser, you will see the clickstream data being sent to the pipeline when you click on the `View Details` or `Add to Cart` button. You can also see the data in the R2 bucket.
383+
Now when you run the development server locally, and open the website in a browser, clickstream data will be successfully sent to the pipeline. You can learn more about the CORS settings in the [Specifying CORS settings](/pipelines/build-with-pipelines/http/#specifying-cors-settings) documentation.
380384

381-
You can learn more about the CORS settings in the [Specifying CORS settings](/pipelines/build-with-pipelines/http/#specifying-cors-settings) documentation.
382-
383-
## 7. Deploy the application
385+
## 8. Deploy the application
384386

385387
To deploy the application, run the following command:
386388

@@ -406,17 +408,17 @@ Deployed e-commerce-pipelines-client-side triggers (7.60 sec)
406408
Current Version ID: <VERSION_ID>
407409
```
408410

409-
We now need to update the pipeline's CORS settings to include the URL of our newly deployed application. Run the command below, and replace `<URL>` with the URL of the application.
411+
We now need to update the pipeline's CORS settings again. This time, we'll include the URL of our newly deployed application. Run the command below, and replace `<URL>` with the URL of the application.
410412

411413
```sh
412414
npx wrangler pipelines update clickstream-pipeline-client --cors-origins http://localhost:8787 https://<URL>.workers.dev
413415
```
414416

415417
Now, you can access the application at the deployed URL. When you click on the `View Details` or `Add to Cart` button, the clickstream data will be sent to your pipeline.
416418

417-
## 8. View the data in R2
419+
## 9. View the data in R2
418420

419-
You can view the data in the R2 bucket. If you are not signed in to the Cloudflare dashboard, sign in and navigate to the R2 overview page.
421+
You can view the data in the R2 bucket. If you are not signed in to the Cloudflare dashboard, sign in and navigate to the [R2 overview](https://dash.cloudflare.com/?to=/:account/r2/overview) page.
420422

421423
Open the bucket you configured for your pipeline in Step 3. You can see files, representing the clickstream data. These files are newline delimited JSON files. Each row in a file represents one click event. Download one of the files, and open it in your preferred text editor to see the output:
422424

@@ -428,15 +430,50 @@ Open the bucket you configured for your pipeline in Step 3. You can see files, r
428430
{"timestamp":"2025-04-06T16:24:33.978Z","session_id":"1234567890abcdef","user_id":"user333","event_data":{"event_id":467,"event_type":"product_view","page_url":"https://<URL>.workers.dev/","timestamp":"2025-04-06T16:24:33.978Z","product_id":6},"device_info":{"browser":"Chrome","os":"Linux","device":"Mobile","userAgent":"Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Mobile Safari/537.36"},"referrer":""}
429431
```
430432

433+
## 10. Optional: Connect a query engine to your R2 bucket and query the data
434+
Once you have collected the raw events in R2, you might want to query the events, to answer questions such as "how many `product_view` events occurred?". You can connect a query engine, such as MotherDuck, to your R2 bucket.
435+
436+
You can connect the bucket to MotherDuck in several ways, which you can learn about from the [MotherDuck documentation](https://motherduck.com/docs/integrations/cloud-storage/cloudflare-r2/). In this tutorial, you will connect the bucket to MotherDuck using the MotherDuck dashboard.
437+
438+
### Connect your bucket to MotherDuck
439+
440+
Before connecting the bucket to MotherDuck, you need to obtain the Access Key ID and Secret Access Key for the R2 bucket. You can find the instructions to obtain the keys in the [R2 API documentation](/r2/api/tokens/).
441+
442+
Before connecting the bucket to MotherDuck, you need to obtain the Access Key ID and Secret Access Key for the R2 bucket. You can find the instructions to obtain the keys in the [R2 API documentation](/r2/api/tokens/).
443+
444+
1. Log in to the MotherDuck dashboard and select your profile.
445+
2. Navigate to the **Secrets** page.
446+
3. Select the **Add Secret** button and enter the following information:
447+
448+
- **Secret Name**: `Clickstream pipeline`
449+
- **Secret Type**: `Cloudflare R2`
450+
- **Access Key ID**: `ACCESS_KEY_ID` (replace with the Access Key ID)
451+
- **Secret Access Key**: `SECRET_ACCESS_KEY` (replace with the Secret Access Key)
452+
453+
4. Select the **Add Secret** button to save the secret.
454+
455+
### Query the data
456+
In this step, you will query the data stored in the R2 bucket using MotherDuck.
457+
458+
1. Navigate back to the MotherDuck dashboard and select the **+** icon to add a new Notebook.
459+
2. Select the **Add Cell** button to add a new cell to the notebook.
460+
461+
3. In the cell, enter the following query and select the **Run** button to execute the query:
462+
463+
```sql
464+
SELECT count(*) FROM read_json_auto('r2://clickstream-bucket/**/*');
465+
```
466+
467+
The query will return a count of all the events received.
468+
431469
## Conclusion
432470

433471
You have successfully created a Pipeline and used it to send clickstream data from the client. Through this tutorial, you've gained hands-on experience in:
434472

435-
1. Creating a Workers project with a static frontend
473+
1. Creating a Workers project, using static assets
436474
2. Generating and capturing clickstream data
437-
3. Setting up a Cloudflare Pipelines to ingest data into R2
475+
3. Setting up a pipeline to ingest data into R2
438476
4. Deploying the application to Workers
439-
440-
For your next steps, consider connecting your R2 bucket to MotherDuck to analyse the data. You can follow the instructions in the [Analyzing Clickstream Data with MotherDuck and Cloudflare R2](/pipelines/tutorials/query-data-with-motherduck#7-connect-the-r2-bucket-to-motherduck) tutorial to connect your R2 bucket to MotherDuck and analyse data.
477+
5. Using MotherDuck to query the data
441478

442479
You can find the source code of the application in the [GitHub repository](https://github.com/harshil1712/e-commerce-pipelines-client-side).

0 commit comments

Comments
 (0)