Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions src/content/docs/browser-rendering/how-to/queues.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
pcx_content_type: navigation
title: Build a web crawler with Queues and Browser Rendering
external_link: /queues/tutorials/web-crawler-with-browser-rendering/
sidebar:
order: 2
---
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
pcx_content_type: navigation
title: Reference
external_link: /api/resources/browser_rendering/
sidebar:
order: 8
---
Original file line number Diff line number Diff line change
Expand Up @@ -20,22 +20,18 @@ curl -X 'POST' 'https://api.cloudflare.com/client/v4/accounts/<accountId>/browse

## Advanced usage

Navigate to `https://cloudflare.com/` but block images and stylesheets from loading. Undesired requests can be blocked by resource type (`rejectResourceTypes`) or by using a regex pattern (`rejectRequestPattern`).
Navigate to `https://cloudflare.com/` but block images and stylesheets from loading. Undesired requests can be blocked by resource type (`rejectResourceTypes`) or by using a regex pattern (`rejectRequestPattern`). The opposite can also be done, only allow requests that match `allowRequestPattern` or `allowResourceTypes`.

```bash
curl -X POST 'https://api.cloudflare.com/client/v4/accounts/<accountId>/browser-rendering/content' \
-H 'Authorization: Bearer <apiToken>' \
-H 'Content-Type: application/json' \
-d '{
"url": "https://cloudflare.com/",
"rejectResourceTypes": ["image"],
"rejectRequestPattern": ["/^.*\\.(css)"]
"rejectResourceTypes": ["image"],
"rejectRequestPattern": ["/^.*\\.(css)"]
}

```

### Parameters

- `url` _(string)_ - The URL of the webpage to extract content from.
- `rejectResourceTypes` _(array)_ - Blocks specific resource types such as images, fonts from loading to improve performance.
- `rejectRequestPattern` _(array of regex patterns)_ - Prevents loading of resources matching specified patterns such as CSS files.
Many more options exist, like setting HTTP headers using `setExtraHTTPHeaders`, setting `cookies`, and using `gotoOptions` to control page load behaviour - check the endpoint [reference](/api/resources/browser_rendering/subresources/screenshot/methods/create/) for all available parameters.
2 changes: 1 addition & 1 deletion src/content/docs/browser-rendering/rest-api/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
pcx_content_type: navigation
title: REST API
sidebar:
order: 2
order: 3
group:
badge: Beta
---
Expand Down
39 changes: 21 additions & 18 deletions src/content/docs/browser-rendering/rest-api/pdf-endpoint.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ curl -X POST 'https://api.cloudflare.com/client/v4/accounts/<accountId>/browser-
"url": "https://example.com/",
"addStyleTag": [
{ "content": "body { font-family: Arial; }" },
{ "url": "https://cdn.jsdelivr.net/npm/[email protected]/dist/css/bootstrap.min.css" }
{ "url": "https://cdn.jsdelivr.net/npm/[email protected]/dist/css/bootstrap.min.css" }
]
}' \
--output "output.pdf"
Expand Down Expand Up @@ -52,9 +52,9 @@ curl -X POST 'https://api.cloudflare.com/client/v4/accounts/<accountId>/browser-
--output "advanced-output.pdf"
```

## PDF with no images or CSS
## Blocking images and styles when generating a PDF

Use PDF with no images or CSS if you want to accelerate the scanning process and you do not need the images.
The options `rejectResourceTypes` and `rejectRequestPattern` can be used to block requests. The opposite can also be done, _only_ allow certain requests using `allowResourceTypes` and `allowRequestPattern`.

```bash
curl -X POST https://api.cloudflare.com/client/v4/accounts/<acccountID>/browser-rendering/pdf \
Expand All @@ -68,19 +68,22 @@ curl -X POST https://api.cloudflare.com/client/v4/accounts/<acccountID>/browser-
--output "cloudflare.pdf"
```

## Parameters
## Generate PDF from custom HTML

- `url` _(string)_ - The webpage URL to render as a PDF.
- `addStyleTag` _(array of objects)_ - Injects custom CSS before generating the PDF.
- `content` _(string)_ - Inline CSS styles.
- `url` _(string)_ - URL of an external stylesheet.
- `setExtraHTTPHeaders` _(object)_ - Adds custom HTTP headers when making the request.
- `X-Custom-Header` _(string)_ - Example of a custom header.
- `viewport` _(object)_ - Defines the browser viewport size.
- `width` _(number)_ - Viewport width in pixels.
- `height` _(number)_ - Viewport height in pixels.
- `gotoOptions` _(object)_ - Configures page navigation settings.
- `waitUntil` _(string)_ - Defines when the browser considers the page fully loaded.
- `timeout` _(number)_ - Maximum wait time before failing the request.
- `rejectResourceTypes` _(array)_ - Blocks specific resource types to improve rendering performance.
- `rejectRequestPattern` _(array of regex patterns)_ - Prevents loading of resources matching certain patterns.
If you have HTML you'd like to generate a PDF from, the `html` option can be used. The option `addStyleTag` can be used to add custom styles.

```bash
curl -X POST https://api.cloudflare.com/client/v4/accounts/<acccountID>/browser-rendering/pdf \
-H 'Authorization: Bearer <apiToken>' \
-H 'Content-Type: application/json' \
-d '{
"html": "<html><body>Advanced Snapshot</body></html>",
"addStyleTag": [
{ "content": "body { font-family: Arial; }" },
{ "url": "https://cdn.jsdelivr.net/npm/[email protected]/dist/css/bootstrap.min.css" }
]
}' \
--output "invoice.pdf"
```

Many more options exist, like setting HTTP credentials using `authenticate`, setting `cookies`, and using `gotoOptions` to control page load behaviour - check the endpoint [reference](/api/resources/browser_rendering/subresources/pdf/methods/create/) for all available parameters.
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ The `/scrape` endpoint extracts structured data from specific elements on a webp

## Basic usage

Go to `https://example.com` and and extract metadata from all `h1` and `a` elements in the DOM.

```bash
curl -X POST 'https://api.cloudflare.com/client/v4/accounts/<accountId>/browser-rendering/scrape' \
-H 'Authorization: Bearer <apiToken>' \
Expand Down Expand Up @@ -64,11 +66,7 @@ curl -X POST 'https://api.cloudflare.com/client/v4/accounts/<accountId>/browser-
}
```

## Parameters

- `url` _(string)_ - The webpage to extract data from.
- `elements` _(object)_ - Defines the elements to extract from the page.
- `selectors` _(array of strings)_ - List of CSS selectors identifying elements to scrape (e.g., `"h1"`, `".article"`).
Many more options exist, like setting HTTP credentials using `authenticate`, setting `cookies`, and using `gotoOptions` to control page load behaviour - check the endpoint [reference](/api/resources/browser_rendering/subresources/scrape/methods/create/) for all available parameters.

### Response fields

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,11 @@ curl -X POST 'https://api.cloudflare.com/client/v4/accounts/<accountId>/browser-
--output "screenshot.webp"
```

For more options to control the final screenshot, like `clip`, `captureBeyondViewport`, `fullPage` and others, check the endpoint [reference](/api/resources/browser_rendering/subresources/screenshot/methods/create/).

## Advanced usage

Navigate to `https://cloudflare.com/`, changing the page size and waiting until there are no active network connections or up to a maximum of `4500ms`. Then take a `fullPage` screenshot.
Navigate to `https://cloudflare.com/`, changing the page size (`viewport`) and waiting until there are no active network connections (`waitUntil`) or up to a maximum of `4500ms` (`timeout`). Then take a `fullPage` screenshot.

```bash
curl -X POST 'https://api.cloudflare.com/client/v4/accounts/<accountId>/browser-rendering/screenshot' \
Expand All @@ -36,7 +38,7 @@ curl -X POST 'https://api.cloudflare.com/client/v4/accounts/<accountId>/browser-
"url": "https://cloudflare.com/",
"screenshotOptions": {
"fullPage": true,
"omitBackground": true,
"omitBackground": true,
},
"viewport": {
"width": 1280,
Expand All @@ -52,14 +54,15 @@ curl -X POST 'https://api.cloudflare.com/client/v4/accounts/<accountId>/browser-

## Customize CSS and embed custom JavaScript

Instruct the browser to go to `https://example.com`, embed custom JavaScript (`addScriptTag`) and add extra styles (`addStyleTag`), both inline (`addStyleTag.content`) and by loading an external stylesheet (`addStyleTag.url`).

```bash
curl -X POST 'https://api.cloudflare.com/client/v4/accounts/<accountId>/browser-rendering/screenshot' \
-H 'Authorization: Bearer <apiToken>' \
-H 'Content-Type: application/json' \
-d '{
"url": "https://example.com/",
"addScriptTag": [

{ "content": "document.querySelector(`h1`).innerText = `Hello World!!!`" }
],
"addStyleTag": [
Expand All @@ -74,22 +77,4 @@ curl -X POST 'https://api.cloudflare.com/client/v4/accounts/<accountId>/browser-
--output "screenshot.webp"
```

## Parameters

- `url` _(string)_ - The webpage URL to take a screenshot of.
- `html` _(string)_ - Instead of a URL, allows rendering custom HTML for the screenshot.
- `screenshotOptions` _(object)_ - Configures the screenshot format and quality.
- `omitBackground` _(boolean)_ - Removes the default white background when taking a screenshot.
- `viewport` _(object)_ - Sets the browser viewport dimensions for rendering.
- `width` _(number)_ - Viewport width in pixels.
- `height` _(number)_ - Viewport height in pixels.
- `gotoOptions` _(object)_ - Configures how and when the page is considered fully loaded.
- `waitUntil` _(string)_ - Defines when the browser considers navigation complete (`networkidle0`, `domcontentloaded`).
- `networkidle0` - Waits until there are no more than 0 network connections for at least 500 ms before taking a screenshot.
- `timeout` _(number)_ - Maximum wait time (in milliseconds) before navigation times out.
- `addScriptTag` _(array of objects)_ - Injects JavaScript code before taking a screenshot.
- `url` _(string)_ - Loads an external script file before rendering.
- `content` _(string)_ - Runs inline JavaScript before rendering.
- `addStyleTag` _(array of objects)_ - Injects CSS styles before rendering.
- `content` _(string)_ - Defines inline CSS rules.
- `url` _(string)_ - Loads external stylesheets before rendering.
Many more options exist, like setting HTTP credentials using `authenticate`, setting `cookies`, and using `gotoOptions` to control page load behaviour - check the endpoint [reference](/api/resources/browser_rendering/subresources/screenshot/methods/create/) for all available parameters.
34 changes: 10 additions & 24 deletions src/content/docs/browser-rendering/rest-api/snapshot.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ curl -X POST 'https://api.cloudflare.com/client/v4/accounts/<accountId>/browser-

```json title="json response"
{
"status": true,
"success": true,
"result": {
"screenshot": "Base64EncodedScreenshotString",
"content": "<html>...</html>"
Expand All @@ -43,9 +43,10 @@ curl -X POST 'https://api.cloudflare.com/client/v4/accounts/<accountId>/browser-
The `html` property in the JSON payload, it sets the html to `<html><body>Advanced Snapshot</body></html>` then does the following steps:

1. Disable JavaScript.
2. Changes the page size `(viewport)`.
3. Waits up to `30000ms` or until the `DOMContentLoaded` event starts.
4. Returns the rendered HTML content and a base-64 encoded screenshot of the page.
2. Sets the screenshot to `fullPage`.
3. Changes the page size `(viewport)`.
4. Waits up to `30000ms` or until the `DOMContentLoaded` event fires.
5. Returns the rendered HTML content and a base-64 encoded screenshot of the page.

```bash
curl -X POST 'https://api.cloudflare.com/client/v4/accounts/<accountId>/browser-rendering/snapshot' \
Expand All @@ -54,6 +55,9 @@ curl -X POST 'https://api.cloudflare.com/client/v4/accounts/<accountId>/browser-
-d '{
"html": "<html><body>Advanced Snapshot</body></html>",
"setJavaScriptEnabled": false,
"screenshotOptions": {
"fullPage": true,
},
"viewport": {
"width": 1200,
"height": 800
Expand All @@ -69,30 +73,12 @@ curl -X POST 'https://api.cloudflare.com/client/v4/accounts/<accountId>/browser-

```json title="json response"
{
"status": true,
"errors": [],
"success": true,
"result": {
"screenshot": "AdvancedBase64Screenshot",
"content": "<html><body>Advanced Snapshot</body></html>"
}
}
```

## Parameters

- `url` _(string)_ - The URL of the page to snapshot.
- `html` _(string)_ - Allows passing custom HTML instead of a URL.
- `setJavaScriptEnabled` _(boolean)_ - Enables or disables JavaScript execution on the page.
- `viewport` \*(object)- Sets the rendering viewport dimensions.
- `width` _(number)_ - Width in pixels.
- `height` _(number)_ - Height in pixels.
- `gotoOptions` _(object)_ - Determines when the page is fully loaded.
- `waitUntil` _(string)_ - Defines the loading strategy (`domcontentloaded`, `networkidle2`).
- `timeout` _(number)_ - Timeout duration in milliseconds.
- `allowResourceTypes` _(array of strings)_ - Restricts the types of resources allowed to load.
- Example: [`document`, `script`] - Only allows HTML documents and scripts to load, preventing images, stylesheets, and other resources.

### Response fields

- `screenshot` _(string)_ - Base64-encoded image of the rendered page.
- `content` _(string)_ - Fully rendered HTML of the page.
Many more options exist, like setting HTTP credentials using `authenticate`, setting `cookies`, and using `gotoOptions` to control page load behaviour - check the endpoint [reference](/api/resources/browser_rendering/subresources/screenshot/methods/create/) for all available parameters.
Loading