|
| 1 | +--- |
| 2 | +pcx_content_type: concept |
| 3 | +title: Build a RAG from your website |
| 4 | +sidebar: |
| 5 | + order: 4 |
| 6 | +--- |
| 7 | + |
| 8 | +AutoRAG is designed to work out of the box with data in R2 buckets. But what if your content lives on a website or needs to be rendered dynamically? |
| 9 | + |
| 10 | +In this tutorial, we’ll walk through how to: |
| 11 | + |
| 12 | +1. Render your website using Cloudflare's Browser Rendering API |
| 13 | +2. Store the rendered HTML in R2 |
| 14 | +3. Connect it to AutoRAG for querying |
| 15 | + |
| 16 | +## Step 1. Create a Worker to fetch webpages and upload into R2 |
| 17 | + |
| 18 | +We’ll create a Cloudflare Worker that uses Puppeteer to visit your URL, render it, and store the full HTML in your R2 bucket. If you already have an R2 bucket with content you’d like to build a RAG for then you can skip this step. |
| 19 | + |
| 20 | +1. Create a new Worker project named `browser-r2-worker` by running: |
| 21 | + |
| 22 | +```bash |
| 23 | +npm create cloudflare@latest -- browser-r2-worker |
| 24 | +``` |
| 25 | + |
| 26 | +For setup, select the following options: |
| 27 | + |
| 28 | +- For _What would you like to start with_?, choose `Hello World Starter`. |
| 29 | +- For _Which template would you like to use_?, choose `Worker only`. |
| 30 | +- For _Which language do you want to use_?, choose `TypeScript`. |
| 31 | +- For _Do you want to use git for version control_?, choose `Yes`. |
| 32 | +- For _Do you want to deploy your application_?, choose `No` (we will be making some changes before deploying). |
| 33 | + |
| 34 | +2. Install `@cloudflare/puppeteer`, which allows you to control the Browser Rendering instance: |
| 35 | + |
| 36 | +```bash |
| 37 | +npm i @cloudflare/puppeteer |
| 38 | +``` |
| 39 | + |
| 40 | +3. Create a new R2 bucket named `html-bucket` by running: |
| 41 | + |
| 42 | +```bash |
| 43 | +npx wrangler r2 bucket create html-bucket |
| 44 | +``` |
| 45 | + |
| 46 | +4. Add the following configurations to your Wrangler configuration file so your Worker can use browser rendering and your new R2 bucket: |
| 47 | + |
| 48 | +```jsonc |
| 49 | +{ |
| 50 | + "compatibility_flags": ["nodejs_compat"], |
| 51 | + "browser": { |
| 52 | + "binding": "MY_BROWSER", |
| 53 | + }, |
| 54 | + "r2_buckets": [ |
| 55 | + { |
| 56 | + "binding": "HTML_BUCKET", |
| 57 | + "bucket_name": "html-bucket", |
| 58 | + }, |
| 59 | + ], |
| 60 | +} |
| 61 | +``` |
| 62 | + |
| 63 | +5. Replace the contents of `src/index.ts` with the following skeleton script: |
| 64 | + |
| 65 | +```typescript |
| 66 | +import puppeteer from "@cloudflare/puppeteer"; |
| 67 | + |
| 68 | +// Define our environment bindings |
| 69 | +interface Env { |
| 70 | + MY_BROWSER: any; |
| 71 | + HTML_BUCKET: R2Bucket; |
| 72 | +} |
| 73 | + |
| 74 | +// Define request body structure |
| 75 | +interface RequestBody { |
| 76 | + url: string; |
| 77 | +} |
| 78 | + |
| 79 | +export default { |
| 80 | + async fetch(request: Request, env: Env): Promise<Response> { |
| 81 | + // Only accept POST requests |
| 82 | + if (request.method !== "POST") { |
| 83 | + return new Response("Please send a POST request with a target URL", { |
| 84 | + status: 405, |
| 85 | + }); |
| 86 | + } |
| 87 | + |
| 88 | + // Get URL from request body |
| 89 | + const body = (await request.json()) as RequestBody; |
| 90 | + // Note: Only use this parser for websites you own |
| 91 | + const targetUrl = new URL(body.url); |
| 92 | + |
| 93 | + // Launch browser and create new page |
| 94 | + const browser = await puppeteer.launch(env.MY_BROWSER); |
| 95 | + const page = await browser.newPage(); |
| 96 | + |
| 97 | + // Navigate to the page and fetch its html |
| 98 | + await page.goto(targetUrl.href); |
| 99 | + const htmlPage = await page.content(); |
| 100 | + |
| 101 | + // Create filename and store in R2 |
| 102 | + const key = targetUrl.hostname + "_" + Date.now() + ".html"; |
| 103 | + await env.HTML_BUCKET.put(key, htmlPage); |
| 104 | + |
| 105 | + // Close browser |
| 106 | + await browser.close(); |
| 107 | + |
| 108 | + // Return success response |
| 109 | + return new Response( |
| 110 | + JSON.stringify({ |
| 111 | + success: true, |
| 112 | + message: "Page rendered and stored successfully", |
| 113 | + key: key, |
| 114 | + }), |
| 115 | + { |
| 116 | + headers: { "Content-Type": "application/json" }, |
| 117 | + }, |
| 118 | + ); |
| 119 | + }, |
| 120 | +} satisfies ExportedHandler<Env>; |
| 121 | +``` |
| 122 | + |
| 123 | +6. Once the code is ready, you can deploy it to your Cloudflare account by running: |
| 124 | + |
| 125 | +```bash |
| 126 | +npx wrangler deploy |
| 127 | +``` |
| 128 | + |
| 129 | +7. To test your Worker, you can use the following cURL request to fetch the HTML file of a page. In this example we are fetching this page to upload into the `html-bucket` bucket: |
| 130 | + |
| 131 | +```bash |
| 132 | +curl -X POST https://browser-r2-worker.<YOUR_SUBDOMAIN>.workers.dev \ |
| 133 | +-H "Content-Type: application/json" \ |
| 134 | +-d '{"url": "https://developers.cloudflare.com/autorag/tutorial/brower-rendering-autorag-tutorial/"}' |
| 135 | +``` |
| 136 | + |
| 137 | +## Step 2. Create your AutoRAG and monitor the indexing |
| 138 | + |
| 139 | +Now that you have created your R2 bucket and filled it with your content that you’d like to query from, you are ready to create an AutoRAG instance: |
| 140 | + |
| 141 | +1. In your [Cloudflare Dashboard](https://dash.cloudflare.com/?to=/:account/ai/autorag), navigate to AI > AutoRAG |
| 142 | +2. Select Create AutoRAG and complete the setup process: |
| 143 | + 1. Select the **R2 bucket** which contains your knowledge base, in this case, select the `html-bucket`. |
| 144 | + 2. Select an **embedding model** used to convert your data to vector representation. It is recommended to use the Default. |
| 145 | + 3. Select an **LLM** to use to generate your responses. It is recommended to use the Default. |
| 146 | + 4. Select or create an **AI Gateway** to monitor and control your model usage. |
| 147 | + 5. **Name** your AutoRAG as `my-rag` |
| 148 | + 6. Select or create a **Service API** token to grant AutoRAG access to create and access resources in your account. |
| 149 | +3. Select Create to spin up your AutoRAG. |
| 150 | + |
| 151 | +Once you’ve created your AutoRAG, it will automatically create a Vectorize database in your account and begin indexing the data. |
| 152 | + |
| 153 | +You can view the progress of your indexing job in the Overview page of your AutoRAG. |
| 154 | + |
| 155 | + |
| 156 | + |
| 157 | +## Step 3. Test and add to your application |
| 158 | + |
| 159 | +Once AutoRAG finishes indexing your content, you’re ready to start asking it questions. You can open up your AutoRAG instance, navigate to the Playground tab, and ask a question based on your uploaded content, like “What is AutoRAG?”. |
| 160 | + |
| 161 | +Once you’re happy with the results in the Playground, you can integrate AutoRAG directly into the application that you are building. If you are using a Worker to build your application, then you can use the AI binding to directly call your AutoRAG: |
| 162 | + |
| 163 | +```jsonc |
| 164 | +{ |
| 165 | + "ai": { |
| 166 | + "binding": "AI", |
| 167 | + }, |
| 168 | +} |
| 169 | +``` |
| 170 | + |
| 171 | +Then, query your AutoRAG instance from your Worker code by calling the `aiSearch()` method. |
| 172 | + |
| 173 | +```javascript |
| 174 | +const answer = await env.AI.AutoRAG("my-rag").aiSearch({ |
| 175 | + query: "What is AutoRAG?", |
| 176 | +}); |
| 177 | +``` |
| 178 | + |
| 179 | +For more information on how to add AutoRAG into your application, go to your AutoRAG then navigate to Use AutoRAG for more instructions. |
0 commit comments