-
Notifications
You must be signed in to change notification settings - Fork 9.7k
Add MotherDuck tutorial #17430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Add MotherDuck tutorial #17430
Changes from 2 commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| --- | ||
| type: overview | ||
| pcx_content_type: navigation | ||
| title: Tutorials | ||
| hideChildren: true | ||
| sidebar: | ||
| order: 7 | ||
| --- | ||
|
|
||
| import { GlossaryTooltip, ListTutorials } from "~/components"; | ||
|
|
||
| View <GlossaryTooltip term="tutorial">tutorials</GlossaryTooltip> to help you get started with Pipelines. | ||
|
|
||
| <ListTutorials /> |
210 changes: 210 additions & 0 deletions
210
src/content/docs/pipelines/tutorials/query-data-with-motherduck/index.mdx
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,210 @@ | ||
| --- | ||
| updated: 2024-10-09 | ||
| difficulty: Intermediate | ||
| content_type: 📝 Tutorial | ||
| pcx_content_type: tutorial | ||
| title: Query R2 data with MotherDuck | ||
| products: | ||
| - R2 | ||
| tags: | ||
| - MotherDuck | ||
| languages: | ||
| - SQL | ||
| --- | ||
|
|
||
| import { Render, PackageManagers } from "~/components"; | ||
|
|
||
| In this tutorial, you will learn how to ingest clickstream data to a R2 bucket using Pipelines. You will also learn how to connect the bucket to MotherDuck. You will then query the data using MotherDuck. | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| 1. Create a [R2 bucket](/r2/buckets/create-buckets/) in your Cloudflare account. | ||
| 2. A [MotherDuck](https://motherduck.com/) account. | ||
|
|
||
| ## 1. Create a pipeline | ||
|
|
||
| To create a new pipeline and connect it to your R2 bucket, you need the `Access Key ID` and the `Secret Access Key` of your R2 bucket. Follow the [R2 documentation](/r2/api/s3/tokens/) to get these keys. Make a note of these keys. You will need them in the next step. | ||
|
|
||
| Create a new pipeline `clickstream-pipeline` using the [Wrangler CLI](/workers/wrangler/): | ||
|
|
||
| ```sh | ||
| npx wrangler pipelines create clickstream-pipeline --r2 <BUCKET_NAME> --access-key-id <ACCESS_KEY_ID> --secret-access-key <SECRET_ACCESS_KEY> | ||
| ``` | ||
|
|
||
| Replace `<BUCKET_NAME>` with the name of your R2 bucket. Replace `<ACCESS_KEY_ID>` and `<SECRET_ACCESS_KEY>` with the keys you created in the previous step. | ||
|
|
||
| ```output | ||
| 🌀 Authorizing R2 bucket <BUCKET_NAME> | ||
| 🌀 Creating pipeline named "clickstream-pipeline" | ||
| ✅ Successfully created pipeline "clickstream-pipeline" with id <PIPELINE_ID> | ||
| 🎉 You can now send data to your pipeline! | ||
| Example: curl "https://<PIPELINE_ID>.pipelines.cloudflare.com" -d '[{"foo": "bar"}]' | ||
| ``` | ||
|
|
||
| Make a note of the URL of your pipeline. You will need it in the next step. | ||
|
|
||
| ## 2. Ingest data to R2 | ||
|
|
||
| In this step, you will ingest data to your R2 bucket using `curl`. You will ingest the following JSON data to your R2 bucket: | ||
|
|
||
| <details> | ||
| <summary> | ||
| Click to view the JSON data | ||
harshil1712 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| </summary> | ||
| ```json | ||
| [ | ||
| { | ||
| "session_id": "1234567890abcdef", | ||
| "user_id": "user123", | ||
| "timestamp": "2024-10-08T14:30:15.123Z", | ||
| "events": [ | ||
| { | ||
| "event_id": "evt001", | ||
| "event_type": "page_view", | ||
| "page_url": "https://example.com/products", | ||
| "timestamp": "2024-10-08T14:30:15.123Z", | ||
| "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36", | ||
| "ip_address": "192.168.1.1" | ||
| }, | ||
| { | ||
| "event_id": "evt002", | ||
| "event_type": "product_view", | ||
| "product_id": "prod456", | ||
| "page_url": "https://example.com/products/prod456", | ||
| "timestamp": "2024-10-08T14:31:20.456Z" | ||
| }, | ||
| { | ||
| "event_id": "evt003", | ||
| "event_type": "add_to_cart", | ||
| "product_id": "prod456", | ||
| "quantity": 1, | ||
| "page_url": "https://example.com/products/prod456", | ||
| "timestamp": "2024-10-08T14:32:05.789Z" | ||
| } | ||
| ], | ||
| "device_info": { | ||
| "device_type": "desktop", | ||
| "operating_system": "Windows 10", | ||
| "browser": "Chrome" | ||
| }, | ||
| "referrer": "https://google.com" | ||
| }, | ||
| { | ||
| "session_id": "abcdef1234567890", | ||
| "user_id": "user456", | ||
| "timestamp": "2024-10-08T15:45:30.987Z", | ||
| "events": [ | ||
| { | ||
| "event_id": "evt004", | ||
| "event_type": "page_view", | ||
| "page_url": "https://example.com/blog", | ||
| "timestamp": "2024-10-08T15:45:30.987Z", | ||
| "user_agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 14_4 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0.3 Mobile/15E148 Safari/604.1", | ||
| "ip_address": "203.0.113.1" | ||
| }, | ||
| { | ||
| "event_id": "evt005", | ||
| "event_type": "scroll", | ||
| "scroll_depth": "75%", | ||
| "page_url": "https://example.com/blog/article1", | ||
| "timestamp": "2024-10-08T15:47:12.345Z" | ||
| }, | ||
| { | ||
| "event_id": "evt006", | ||
| "event_type": "social_share", | ||
| "platform": "twitter", | ||
| "content_id": "article1", | ||
| "page_url": "https://example.com/blog/article1", | ||
| "timestamp": "2024-10-08T15:48:55.678Z" | ||
| } | ||
| ], | ||
| "device_info": { | ||
| "device_type": "mobile", | ||
| "operating_system": "iOS 14.4", | ||
| "browser": "Safari" | ||
| }, | ||
| "referrer": "https://t.co/abcd123" | ||
| }, | ||
| { | ||
| "session_id": "9876543210fedcba", | ||
| "user_id": "user789", | ||
| "timestamp": "2024-10-08T18:20:00.111Z", | ||
| "events": [ | ||
| { | ||
| "event_id": "evt007", | ||
| "event_type": "page_view", | ||
| "page_url": "https://example.com/login", | ||
| "timestamp": "2024-10-08T18:20:00.111Z", | ||
| "user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36", | ||
| "ip_address": "198.51.100.1" | ||
| }, | ||
| { | ||
| "event_id": "evt008", | ||
| "event_type": "form_submission", | ||
| "form_id": "login-form", | ||
| "page_url": "https://example.com/login", | ||
| "timestamp": "2024-10-08T18:20:45.222Z" | ||
| }, | ||
| { | ||
| "event_id": "evt009", | ||
| "event_type": "page_view", | ||
| "page_url": "https://example.com/dashboard", | ||
| "timestamp": "2024-10-08T18:20:50.333Z" | ||
| }, | ||
| { | ||
| "event_id": "evt010", | ||
| "event_type": "feature_usage", | ||
| "feature_id": "data_export", | ||
| "page_url": "https://example.com/dashboard", | ||
| "timestamp": "2024-10-08T18:22:30.444Z" | ||
| } | ||
| ], | ||
| "device_info": { | ||
| "device_type": "desktop", | ||
| "operating_system": "macOS 10.15", | ||
| "browser": "Chrome" | ||
| }, | ||
| "referrer": "https://example.com/home" | ||
| } | ||
| ] | ||
| ``` | ||
| </details> | ||
|
|
||
| Run the following command to ingest the data to your R2 bucket using the pipeline you created in the previous step: | ||
|
|
||
| ```sh | ||
| curl -X POST 'https://<PIPELINE_ID>.pipelines.cloudflare.com' -d '<JSON_DATA>' | ||
| ``` | ||
|
|
||
| Replace `<PIPELINE_ID>` with the ID of the pipeline you created in the previous step. Also, replace `<JSON_DATA>` with the JSON data provided above. | ||
|
|
||
| ## 3. Connnect the R2 bucket to MotherDuck | ||
|
|
||
| In this step, you will connect the R2 bucket to MotherDuck. You can connect the bucket to MotherDuck in several ways. You can learn about these different approaches in the [MotherDuck documentation](https://motherduck.com/docs/integrations/cloud-storage/cloudflare-r2/). In this tutorial, you will connect the bucket to MotherDuck using the MotherDuck dashboard. | ||
harshil1712 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| Login to the MotherDuck dashboard and click on your profile. Navigate to the **Secrets** page. Click on the **Add Secret** button and enter the following information: | ||
harshil1712 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| - **Secret Name**: `Clickstream pipeline` | ||
| - **Secret Type**: `Cloudflare R2` | ||
| - **Access Key ID**: `ACCESS_KEY_ID` (replace with the Access Key ID you obtained in the previous step) | ||
| - **Secret Access Key**: `SECRET_ACCESS_KEY` (replace with the Secret Access Key you obtained in the previous step) | ||
|
|
||
| Click on the **Add Secret** button to save the secret. | ||
harshil1712 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ## 4. Query the data | ||
|
|
||
| In this step, you will query the data stored in the R2 bucket using MotherDuck. Navigate back to the MotherDuck dashboard and click on the **+** icon to add a new Notebook. Click on the **Add Cell** button to add a new cell to the notebook. | ||
harshil1712 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| In the cell, enter the following query and click on the **Run** button to execute the query: | ||
harshil1712 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ```sql | ||
| SELECT * FROM `r2://<BUCKET_NAME>/<PATH_TO_FILE>`; | ||
| ``` | ||
|
|
||
| Replace the `<BUCKET_NAME>` placeholder with the name of the R2 bucket you created in the previous step. Replace the `<PATH_TO_FILE>` placeholder with the path to the file you uploaded in the previous step. You can find the path to the file by navigating to the object in the Cloudflare dashboard. | ||
|
|
||
| The query will return the data stored in the R2 bucket. | ||
|
|
||
| ## Conclusion | ||
|
|
||
| In this tutorial, you learned to create a pipeline and ingest data into a R2 bucket. You also learned how to connect the bucket with MotherDuck and query the data stored in the bucket. You can use this tutorial as a starting point to ingest data into an R2 bucket, and use MotherDuck to query the data stored in the bucket. | ||
harshil1712 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.