Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions src/content/docs/browser-rendering/how-to/queues.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
pcx_content_type: navigation
title: Build a web crawler with Queues and Browser Rendering
external_link: /queues/tutorials/web-crawler-with-browser-rendering/
sidebar:
order: 2
---
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
pcx_content_type: navigation
title: Reference
external_link: /api/resources/browser_rendering/
sidebar:
order: 8
---
Original file line number Diff line number Diff line change
Expand Up @@ -20,22 +20,18 @@ curl -X 'POST' 'https://api.cloudflare.com/client/v4/accounts/<accountId>/browse

## Advanced usage

Navigate to `https://cloudflare.com/` but block images and stylesheets from loading. Undesired requests can be blocked by resource type (`rejectResourceTypes`) or by using a regex pattern (`rejectRequestPattern`).
Navigate to `https://cloudflare.com/` but block images and stylesheets from loading. Undesired requests can be blocked by resource type (`rejectResourceTypes`) or by using a regex pattern (`rejectRequestPattern`). The opposite can also be done, only allow requests that match `allowRequestPattern` or `allowResourceTypes`.

```bash
curl -X POST 'https://api.cloudflare.com/client/v4/accounts/<accountId>/browser-rendering/content' \
-H 'Authorization: Bearer <apiToken>' \
-H 'Content-Type: application/json' \
-d '{
"url": "https://cloudflare.com/",
"rejectResourceTypes": ["image"],
"rejectRequestPattern": ["/^.*\\.(css)"]
}
"rejectResourceTypes": ["image"],
"rejectRequestPattern": ["/^.*\\.(css)"]
}'

```

### Parameters

- `url` _(string)_ - The URL of the webpage to extract content from.
- `rejectResourceTypes` _(array)_ - Blocks specific resource types such as images, fonts from loading to improve performance.
- `rejectRequestPattern` _(array of regex patterns)_ - Prevents loading of resources matching specified patterns such as CSS files.
Many more options exist, like setting HTTP headers using `setExtraHTTPHeaders`, setting `cookies`, and using `gotoOptions` to control page load behaviour - check the endpoint [reference](/api/resources/browser_rendering/subresources/screenshot/methods/create/) for all available parameters.
2 changes: 1 addition & 1 deletion src/content/docs/browser-rendering/rest-api/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
pcx_content_type: navigation
title: REST API
sidebar:
order: 2
order: 3
group:
badge: Beta
---
Expand Down
39 changes: 21 additions & 18 deletions src/content/docs/browser-rendering/rest-api/pdf-endpoint.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ curl -X POST 'https://api.cloudflare.com/client/v4/accounts/<accountId>/browser-
"url": "https://example.com/",
"addStyleTag": [
{ "content": "body { font-family: Arial; }" },
{ "url": "https://cdn.jsdelivr.net/npm/[email protected]/dist/css/bootstrap.min.css" }
{ "url": "https://cdn.jsdelivr.net/npm/[email protected]/dist/css/bootstrap.min.css" }
]
}' \
--output "output.pdf"
Expand Down Expand Up @@ -52,9 +52,9 @@ curl -X POST 'https://api.cloudflare.com/client/v4/accounts/<accountId>/browser-
--output "advanced-output.pdf"
```

## PDF with no images or CSS
## Blocking images and styles when generating a PDF

Use PDF with no images or CSS if you want to accelerate the scanning process and you do not need the images.
The options `rejectResourceTypes` and `rejectRequestPattern` can be used to block requests. The opposite can also be done, _only_ allow certain requests using `allowResourceTypes` and `allowRequestPattern`.

```bash
curl -X POST https://api.cloudflare.com/client/v4/accounts/<acccountID>/browser-rendering/pdf \
Expand All @@ -68,19 +68,22 @@ curl -X POST https://api.cloudflare.com/client/v4/accounts/<acccountID>/browser-
--output "cloudflare.pdf"
```

## Parameters
## Generate PDF from custom HTML

- `url` _(string)_ - The webpage URL to render as a PDF.
- `addStyleTag` _(array of objects)_ - Injects custom CSS before generating the PDF.
- `content` _(string)_ - Inline CSS styles.
- `url` _(string)_ - URL of an external stylesheet.
- `setExtraHTTPHeaders` _(object)_ - Adds custom HTTP headers when making the request.
- `X-Custom-Header` _(string)_ - Example of a custom header.
- `viewport` _(object)_ - Defines the browser viewport size.
- `width` _(number)_ - Viewport width in pixels.
- `height` _(number)_ - Viewport height in pixels.
- `gotoOptions` _(object)_ - Configures page navigation settings.
- `waitUntil` _(string)_ - Defines when the browser considers the page fully loaded.
- `timeout` _(number)_ - Maximum wait time before failing the request.
- `rejectResourceTypes` _(array)_ - Blocks specific resource types to improve rendering performance.
- `rejectRequestPattern` _(array of regex patterns)_ - Prevents loading of resources matching certain patterns.
If you have HTML you'd like to generate a PDF from, the `html` option can be used. The option `addStyleTag` can be used to add custom styles.

```bash
curl -X POST https://api.cloudflare.com/client/v4/accounts/<acccountID>/browser-rendering/pdf \
-H 'Authorization: Bearer <apiToken>' \
-H 'Content-Type: application/json' \
-d '{
"html": "<html><body>Advanced Snapshot</body></html>",
"addStyleTag": [
{ "content": "body { font-family: Arial; }" },
{ "url": "https://cdn.jsdelivr.net/npm/[email protected]/dist/css/bootstrap.min.css" }
]
}' \
--output "invoice.pdf"
```

Many more options exist, like setting HTTP credentials using `authenticate`, setting `cookies`, and using `gotoOptions` to control page load behaviour - check the endpoint [reference](/api/resources/browser_rendering/subresources/pdf/methods/create/) for all available parameters.
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ The `/scrape` endpoint extracts structured data from specific elements on a webp

## Basic usage

Go to `https://example.com` and and extract metadata from all `h1` and `a` elements in the DOM.

```bash
curl -X POST 'https://api.cloudflare.com/client/v4/accounts/<accountId>/browser-rendering/scrape' \
-H 'Authorization: Bearer <apiToken>' \
Expand All @@ -21,7 +23,7 @@ curl -X POST 'https://api.cloudflare.com/client/v4/accounts/<accountId>/browser-
{
"selector": "a"
}]
}
}'
```

### JSON response
Expand Down Expand Up @@ -64,11 +66,7 @@ curl -X POST 'https://api.cloudflare.com/client/v4/accounts/<accountId>/browser-
}
```

## Parameters

- `url` _(string)_ - The webpage to extract data from.
- `elements` _(object)_ - Defines the elements to extract from the page.
- `selectors` _(array of strings)_ - List of CSS selectors identifying elements to scrape (e.g., `"h1"`, `".article"`).
Many more options exist, like setting HTTP credentials using `authenticate`, setting `cookies`, and using `gotoOptions` to control page load behaviour - check the endpoint [reference](/api/resources/browser_rendering/subresources/scrape/methods/create/) for all available parameters.

### Response fields

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,22 +21,23 @@ curl -X POST 'https://api.cloudflare.com/client/v4/accounts/<accountId>/browser-
"omitBackground": true
}
}' \
--output "screenshot.webp"
--output "screenshot.png"
```

For more options to control the final screenshot, like `clip`, `captureBeyondViewport`, `fullPage` and others, check the endpoint [reference](/api/resources/browser_rendering/subresources/screenshot/methods/create/).

## Advanced usage

Navigate to `https://cloudflare.com/`, changing the page size and waiting until there are no active network connections or up to a maximum of `4500ms`. Then take a `fullPage` screenshot.
Navigate to `https://cloudflare.com/`, changing the page size (`viewport`) and waiting until there are no active network connections (`waitUntil`) or up to a maximum of `4500ms` (`timeout`). Then take a `fullPage` screenshot.

```bash
curl -X POST 'https://api.cloudflare.com/client/v4/accounts/<accountId>/browser-rendering/screenshot' \
-H 'Authorization: Bearer <apiToken>' \
-H 'Content-Type: application/json' \
-d '{
"url": "https://cloudflare.com/",
"url": "https://cnn.com/",
"screenshotOptions": {
"fullPage": true,
"omitBackground": true,
"fullPage": true
},
"viewport": {
"width": 1280,
Expand All @@ -47,19 +48,20 @@ curl -X POST 'https://api.cloudflare.com/client/v4/accounts/<accountId>/browser-
"timeout": 45000
}
}' \
--output "advanced-screenshot.webp"
--output "advanced-screenshot.png"
```

## Customize CSS and embed custom JavaScript

Instruct the browser to go to `https://example.com`, embed custom JavaScript (`addScriptTag`) and add extra styles (`addStyleTag`), both inline (`addStyleTag.content`) and by loading an external stylesheet (`addStyleTag.url`).

```bash
curl -X POST 'https://api.cloudflare.com/client/v4/accounts/<accountId>/browser-rendering/screenshot' \
-H 'Authorization: Bearer <apiToken>' \
-H 'Content-Type: application/json' \
-d '{
"url": "https://example.com/",
"addScriptTag": [

{ "content": "document.querySelector(`h1`).innerText = `Hello World!!!`" }
],
"addStyleTag": [
Expand All @@ -71,25 +73,7 @@ curl -X POST 'https://api.cloudflare.com/client/v4/accounts/<accountId>/browser-
}
]
}' \
--output "screenshot.webp"
--output "screenshot.png"
```

## Parameters

- `url` _(string)_ - The webpage URL to take a screenshot of.
- `html` _(string)_ - Instead of a URL, allows rendering custom HTML for the screenshot.
- `screenshotOptions` _(object)_ - Configures the screenshot format and quality.
- `omitBackground` _(boolean)_ - Removes the default white background when taking a screenshot.
- `viewport` _(object)_ - Sets the browser viewport dimensions for rendering.
- `width` _(number)_ - Viewport width in pixels.
- `height` _(number)_ - Viewport height in pixels.
- `gotoOptions` _(object)_ - Configures how and when the page is considered fully loaded.
- `waitUntil` _(string)_ - Defines when the browser considers navigation complete (`networkidle0`, `domcontentloaded`).
- `networkidle0` - Waits until there are no more than 0 network connections for at least 500 ms before taking a screenshot.
- `timeout` _(number)_ - Maximum wait time (in milliseconds) before navigation times out.
- `addScriptTag` _(array of objects)_ - Injects JavaScript code before taking a screenshot.
- `url` _(string)_ - Loads an external script file before rendering.
- `content` _(string)_ - Runs inline JavaScript before rendering.
- `addStyleTag` _(array of objects)_ - Injects CSS styles before rendering.
- `content` _(string)_ - Defines inline CSS rules.
- `url` _(string)_ - Loads external stylesheets before rendering.
Many more options exist, like setting HTTP credentials using `authenticate`, setting `cookies`, and using `gotoOptions` to control page load behaviour - check the endpoint [reference](/api/resources/browser_rendering/subresources/screenshot/methods/create/) for all available parameters.
34 changes: 10 additions & 24 deletions src/content/docs/browser-rendering/rest-api/snapshot.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ curl -X POST 'https://api.cloudflare.com/client/v4/accounts/<accountId>/browser-

```json title="json response"
{
"status": true,
"success": true,
"result": {
"screenshot": "Base64EncodedScreenshotString",
"content": "<html>...</html>"
Expand All @@ -43,9 +43,10 @@ curl -X POST 'https://api.cloudflare.com/client/v4/accounts/<accountId>/browser-
The `html` property in the JSON payload, it sets the html to `<html><body>Advanced Snapshot</body></html>` then does the following steps:

1. Disable JavaScript.
2. Changes the page size `(viewport)`.
3. Waits up to `30000ms` or until the `DOMContentLoaded` event starts.
4. Returns the rendered HTML content and a base-64 encoded screenshot of the page.
2. Sets the screenshot to `fullPage`.
3. Changes the page size `(viewport)`.
4. Waits up to `30000ms` or until the `DOMContentLoaded` event fires.
5. Returns the rendered HTML content and a base-64 encoded screenshot of the page.

```bash
curl -X POST 'https://api.cloudflare.com/client/v4/accounts/<accountId>/browser-rendering/snapshot' \
Expand All @@ -54,6 +55,9 @@ curl -X POST 'https://api.cloudflare.com/client/v4/accounts/<accountId>/browser-
-d '{
"html": "<html><body>Advanced Snapshot</body></html>",
"setJavaScriptEnabled": false,
"screenshotOptions": {
"fullPage": true
},
"viewport": {
"width": 1200,
"height": 800
Expand All @@ -69,30 +73,12 @@ curl -X POST 'https://api.cloudflare.com/client/v4/accounts/<accountId>/browser-

```json title="json response"
{
"status": true,
"errors": [],
"success": true,
"result": {
"screenshot": "AdvancedBase64Screenshot",
"content": "<html><body>Advanced Snapshot</body></html>"
}
}
```

## Parameters

- `url` _(string)_ - The URL of the page to snapshot.
- `html` _(string)_ - Allows passing custom HTML instead of a URL.
- `setJavaScriptEnabled` _(boolean)_ - Enables or disables JavaScript execution on the page.
- `viewport` \*(object)- Sets the rendering viewport dimensions.
- `width` _(number)_ - Width in pixels.
- `height` _(number)_ - Height in pixels.
- `gotoOptions` _(object)_ - Determines when the page is fully loaded.
- `waitUntil` _(string)_ - Defines the loading strategy (`domcontentloaded`, `networkidle2`).
- `timeout` _(number)_ - Timeout duration in milliseconds.
- `allowResourceTypes` _(array of strings)_ - Restricts the types of resources allowed to load.
- Example: [`document`, `script`] - Only allows HTML documents and scripts to load, preventing images, stylesheets, and other resources.

### Response fields

- `screenshot` _(string)_ - Base64-encoded image of the rendered page.
- `content` _(string)_ - Fully rendered HTML of the page.
Many more options exist, like setting HTTP credentials using `authenticate`, setting `cookies`, and using `gotoOptions` to control page load behaviour - check the endpoint [reference](/api/resources/browser_rendering/subresources/screenshot/methods/create/) for all available parameters.
Loading