diff --git a/src/content/docs/browser-rendering/how-to/queues.mdx b/src/content/docs/browser-rendering/how-to/queues.mdx new file mode 100644 index 000000000000000..610b1271d7df588 --- /dev/null +++ b/src/content/docs/browser-rendering/how-to/queues.mdx @@ -0,0 +1,7 @@ +--- +pcx_content_type: navigation +title: Build a web crawler with Queues and Browser Rendering +external_link: /queues/tutorials/web-crawler-with-browser-rendering/ +sidebar: + order: 2 +--- diff --git a/src/content/docs/browser-rendering/rest-api/api-reference.mdx b/src/content/docs/browser-rendering/rest-api/api-reference.mdx new file mode 100644 index 000000000000000..70d5c259e381f5b --- /dev/null +++ b/src/content/docs/browser-rendering/rest-api/api-reference.mdx @@ -0,0 +1,7 @@ +--- +pcx_content_type: navigation +title: Reference +external_link: /api/resources/browser_rendering/ +sidebar: + order: 8 +--- diff --git a/src/content/docs/browser-rendering/rest-api/content-endpoint.mdx b/src/content/docs/browser-rendering/rest-api/content-endpoint.mdx index d4bb17611e512b8..e3ec9c7514b3bf1 100644 --- a/src/content/docs/browser-rendering/rest-api/content-endpoint.mdx +++ b/src/content/docs/browser-rendering/rest-api/content-endpoint.mdx @@ -20,7 +20,7 @@ curl -X 'POST' 'https://api.cloudflare.com/client/v4/accounts//browse ## Advanced usage -Navigate to `https://cloudflare.com/` but block images and stylesheets from loading. Undesired requests can be blocked by resource type (`rejectResourceTypes`) or by using a regex pattern (`rejectRequestPattern`). +Navigate to `https://cloudflare.com/` but block images and stylesheets from loading. Undesired requests can be blocked by resource type (`rejectResourceTypes`) or by using a regex pattern (`rejectRequestPattern`). The opposite can also be done, only allow requests that match `allowRequestPattern` or `allowResourceTypes`. ```bash curl -X POST 'https://api.cloudflare.com/client/v4/accounts//browser-rendering/content' \ @@ -28,14 +28,10 @@ curl -X POST 'https://api.cloudflare.com/client/v4/accounts//browser- -H 'Content-Type: application/json' \ -d '{ "url": "https://cloudflare.com/", - "rejectResourceTypes": ["image"], - "rejectRequestPattern": ["/^.*\\.(css)"] -} + "rejectResourceTypes": ["image"], + "rejectRequestPattern": ["/^.*\\.(css)"] + }' ``` -### Parameters - -- `url` _(string)_ - The URL of the webpage to extract content from. -- `rejectResourceTypes` _(array)_ - Blocks specific resource types such as images, fonts from loading to improve performance. -- `rejectRequestPattern` _(array of regex patterns)_ - Prevents loading of resources matching specified patterns such as CSS files. +Many more options exist, like setting HTTP headers using `setExtraHTTPHeaders`, setting `cookies`, and using `gotoOptions` to control page load behaviour - check the endpoint [reference](/api/resources/browser_rendering/subresources/screenshot/methods/create/) for all available parameters. diff --git a/src/content/docs/browser-rendering/rest-api/index.mdx b/src/content/docs/browser-rendering/rest-api/index.mdx index 656be53ba352c44..bda3adf4671d033 100644 --- a/src/content/docs/browser-rendering/rest-api/index.mdx +++ b/src/content/docs/browser-rendering/rest-api/index.mdx @@ -2,7 +2,7 @@ pcx_content_type: navigation title: REST API sidebar: - order: 2 + order: 3 group: badge: Beta --- diff --git a/src/content/docs/browser-rendering/rest-api/pdf-endpoint.mdx b/src/content/docs/browser-rendering/rest-api/pdf-endpoint.mdx index 7ca1c02a0a15438..a08c009ab1895fe 100644 --- a/src/content/docs/browser-rendering/rest-api/pdf-endpoint.mdx +++ b/src/content/docs/browser-rendering/rest-api/pdf-endpoint.mdx @@ -19,7 +19,7 @@ curl -X POST 'https://api.cloudflare.com/client/v4/accounts//browser- "url": "https://example.com/", "addStyleTag": [ { "content": "body { font-family: Arial; }" }, - { "url": "https://cdn.jsdelivr.net/npm/bootstrap@3.3.7/dist/css/bootstrap.min.css" } + { "url": "https://cdn.jsdelivr.net/npm/bootstrap@3.3.7/dist/css/bootstrap.min.css" } ] }' \ --output "output.pdf" @@ -52,9 +52,9 @@ curl -X POST 'https://api.cloudflare.com/client/v4/accounts//browser- --output "advanced-output.pdf" ``` -## PDF with no images or CSS +## Blocking images and styles when generating a PDF -Use PDF with no images or CSS if you want to accelerate the scanning process and you do not need the images. +The options `rejectResourceTypes` and `rejectRequestPattern` can be used to block requests. The opposite can also be done, _only_ allow certain requests using `allowResourceTypes` and `allowRequestPattern`. ```bash curl -X POST https://api.cloudflare.com/client/v4/accounts//browser-rendering/pdf \ @@ -68,19 +68,22 @@ curl -X POST https://api.cloudflare.com/client/v4/accounts//browser- --output "cloudflare.pdf" ``` -## Parameters +## Generate PDF from custom HTML -- `url` _(string)_ - The webpage URL to render as a PDF. -- `addStyleTag` _(array of objects)_ - Injects custom CSS before generating the PDF. - - `content` _(string)_ - Inline CSS styles. - - `url` _(string)_ - URL of an external stylesheet. -- `setExtraHTTPHeaders` _(object)_ - Adds custom HTTP headers when making the request. - - `X-Custom-Header` _(string)_ - Example of a custom header. -- `viewport` _(object)_ - Defines the browser viewport size. - - `width` _(number)_ - Viewport width in pixels. - - `height` _(number)_ - Viewport height in pixels. -- `gotoOptions` _(object)_ - Configures page navigation settings. - - `waitUntil` _(string)_ - Defines when the browser considers the page fully loaded. - - `timeout` _(number)_ - Maximum wait time before failing the request. -- `rejectResourceTypes` _(array)_ - Blocks specific resource types to improve rendering performance. -- `rejectRequestPattern` _(array of regex patterns)_ - Prevents loading of resources matching certain patterns. +If you have HTML you'd like to generate a PDF from, the `html` option can be used. The option `addStyleTag` can be used to add custom styles. + +```bash +curl -X POST https://api.cloudflare.com/client/v4/accounts//browser-rendering/pdf \ + -H 'Authorization: Bearer ' \ + -H 'Content-Type: application/json' \ + -d '{ + "html": "Advanced Snapshot", + "addStyleTag": [ + { "content": "body { font-family: Arial; }" }, + { "url": "https://cdn.jsdelivr.net/npm/bootstrap@3.3.7/dist/css/bootstrap.min.css" } + ] +}' \ + --output "invoice.pdf" +``` + +Many more options exist, like setting HTTP credentials using `authenticate`, setting `cookies`, and using `gotoOptions` to control page load behaviour - check the endpoint [reference](/api/resources/browser_rendering/subresources/pdf/methods/create/) for all available parameters. diff --git a/src/content/docs/browser-rendering/rest-api/scrape-endpoint.mdx b/src/content/docs/browser-rendering/rest-api/scrape-endpoint.mdx index 95d79405c0a305d..a7aab3f50e14949 100644 --- a/src/content/docs/browser-rendering/rest-api/scrape-endpoint.mdx +++ b/src/content/docs/browser-rendering/rest-api/scrape-endpoint.mdx @@ -9,6 +9,8 @@ The `/scrape` endpoint extracts structured data from specific elements on a webp ## Basic usage +Go to `https://example.com` and and extract metadata from all `h1` and `a` elements in the DOM. + ```bash curl -X POST 'https://api.cloudflare.com/client/v4/accounts//browser-rendering/scrape' \ -H 'Authorization: Bearer ' \ @@ -21,7 +23,7 @@ curl -X POST 'https://api.cloudflare.com/client/v4/accounts//browser- { "selector": "a" }] -} +}' ``` ### JSON response @@ -64,11 +66,7 @@ curl -X POST 'https://api.cloudflare.com/client/v4/accounts//browser- } ``` -## Parameters - -- `url` _(string)_ - The webpage to extract data from. -- `elements` _(object)_ - Defines the elements to extract from the page. - - `selectors` _(array of strings)_ - List of CSS selectors identifying elements to scrape (e.g., `"h1"`, `".article"`). +Many more options exist, like setting HTTP credentials using `authenticate`, setting `cookies`, and using `gotoOptions` to control page load behaviour - check the endpoint [reference](/api/resources/browser_rendering/subresources/scrape/methods/create/) for all available parameters. ### Response fields diff --git a/src/content/docs/browser-rendering/rest-api/screenshot-endpoint.mdx b/src/content/docs/browser-rendering/rest-api/screenshot-endpoint.mdx index 63d665dc7f1adc9..8002c210fef34f8 100644 --- a/src/content/docs/browser-rendering/rest-api/screenshot-endpoint.mdx +++ b/src/content/docs/browser-rendering/rest-api/screenshot-endpoint.mdx @@ -21,22 +21,23 @@ curl -X POST 'https://api.cloudflare.com/client/v4/accounts//browser- "omitBackground": true } }' \ - --output "screenshot.webp" + --output "screenshot.png" ``` +For more options to control the final screenshot, like `clip`, `captureBeyondViewport`, `fullPage` and others, check the endpoint [reference](/api/resources/browser_rendering/subresources/screenshot/methods/create/). + ## Advanced usage -Navigate to `https://cloudflare.com/`, changing the page size and waiting until there are no active network connections or up to a maximum of `4500ms`. Then take a `fullPage` screenshot. +Navigate to `https://cloudflare.com/`, changing the page size (`viewport`) and waiting until there are no active network connections (`waitUntil`) or up to a maximum of `4500ms` (`timeout`). Then take a `fullPage` screenshot. ```bash curl -X POST 'https://api.cloudflare.com/client/v4/accounts//browser-rendering/screenshot' \ -H 'Authorization: Bearer ' \ -H 'Content-Type: application/json' \ -d '{ - "url": "https://cloudflare.com/", + "url": "https://cnn.com/", "screenshotOptions": { - "fullPage": true, - "omitBackground": true, + "fullPage": true }, "viewport": { "width": 1280, @@ -47,11 +48,13 @@ curl -X POST 'https://api.cloudflare.com/client/v4/accounts//browser- "timeout": 45000 } }' \ - --output "advanced-screenshot.webp" + --output "advanced-screenshot.png" ``` ## Customize CSS and embed custom JavaScript +Instruct the browser to go to `https://example.com`, embed custom JavaScript (`addScriptTag`) and add extra styles (`addStyleTag`), both inline (`addStyleTag.content`) and by loading an external stylesheet (`addStyleTag.url`). + ```bash curl -X POST 'https://api.cloudflare.com/client/v4/accounts//browser-rendering/screenshot' \ -H 'Authorization: Bearer ' \ @@ -59,7 +62,6 @@ curl -X POST 'https://api.cloudflare.com/client/v4/accounts//browser- -d '{ "url": "https://example.com/", "addScriptTag": [ - { "content": "document.querySelector(`h1`).innerText = `Hello World!!!`" } ], "addStyleTag": [ @@ -71,25 +73,7 @@ curl -X POST 'https://api.cloudflare.com/client/v4/accounts//browser- } ] }' \ - --output "screenshot.webp" + --output "screenshot.png" ``` -## Parameters - -- `url` _(string)_ - The webpage URL to take a screenshot of. -- `html` _(string)_ - Instead of a URL, allows rendering custom HTML for the screenshot. -- `screenshotOptions` _(object)_ - Configures the screenshot format and quality. - - `omitBackground` _(boolean)_ - Removes the default white background when taking a screenshot. -- `viewport` _(object)_ - Sets the browser viewport dimensions for rendering. - - `width` _(number)_ - Viewport width in pixels. - - `height` _(number)_ - Viewport height in pixels. -- `gotoOptions` _(object)_ - Configures how and when the page is considered fully loaded. - - `waitUntil` _(string)_ - Defines when the browser considers navigation complete (`networkidle0`, `domcontentloaded`). - - `networkidle0` - Waits until there are no more than 0 network connections for at least 500 ms before taking a screenshot. - - `timeout` _(number)_ - Maximum wait time (in milliseconds) before navigation times out. -- `addScriptTag` _(array of objects)_ - Injects JavaScript code before taking a screenshot. - - `url` _(string)_ - Loads an external script file before rendering. - - `content` _(string)_ - Runs inline JavaScript before rendering. -- `addStyleTag` _(array of objects)_ - Injects CSS styles before rendering. - - `content` _(string)_ - Defines inline CSS rules. - - `url` _(string)_ - Loads external stylesheets before rendering. +Many more options exist, like setting HTTP credentials using `authenticate`, setting `cookies`, and using `gotoOptions` to control page load behaviour - check the endpoint [reference](/api/resources/browser_rendering/subresources/screenshot/methods/create/) for all available parameters. diff --git a/src/content/docs/browser-rendering/rest-api/snapshot.mdx b/src/content/docs/browser-rendering/rest-api/snapshot.mdx index 8bbb4263e3a2d58..25d7ffbaaeb940f 100644 --- a/src/content/docs/browser-rendering/rest-api/snapshot.mdx +++ b/src/content/docs/browser-rendering/rest-api/snapshot.mdx @@ -30,7 +30,7 @@ curl -X POST 'https://api.cloudflare.com/client/v4/accounts//browser- ```json title="json response" { - "status": true, + "success": true, "result": { "screenshot": "Base64EncodedScreenshotString", "content": "..." @@ -43,9 +43,10 @@ curl -X POST 'https://api.cloudflare.com/client/v4/accounts//browser- The `html` property in the JSON payload, it sets the html to `Advanced Snapshot` then does the following steps: 1. Disable JavaScript. -2. Changes the page size `(viewport)`. -3. Waits up to `30000ms` or until the `DOMContentLoaded` event starts. -4. Returns the rendered HTML content and a base-64 encoded screenshot of the page. +2. Sets the screenshot to `fullPage`. +3. Changes the page size `(viewport)`. +4. Waits up to `30000ms` or until the `DOMContentLoaded` event fires. +5. Returns the rendered HTML content and a base-64 encoded screenshot of the page. ```bash curl -X POST 'https://api.cloudflare.com/client/v4/accounts//browser-rendering/snapshot' \ @@ -54,6 +55,9 @@ curl -X POST 'https://api.cloudflare.com/client/v4/accounts//browser- -d '{ "html": "Advanced Snapshot", "setJavaScriptEnabled": false, + "screenshotOptions": { + "fullPage": true + }, "viewport": { "width": 1200, "height": 800 @@ -69,8 +73,7 @@ curl -X POST 'https://api.cloudflare.com/client/v4/accounts//browser- ```json title="json response" { - "status": true, - "errors": [], + "success": true, "result": { "screenshot": "AdvancedBase64Screenshot", "content": "Advanced Snapshot" @@ -78,21 +81,4 @@ curl -X POST 'https://api.cloudflare.com/client/v4/accounts//browser- } ``` -## Parameters - -- `url` _(string)_ - The URL of the page to snapshot. -- `html` _(string)_ - Allows passing custom HTML instead of a URL. -- `setJavaScriptEnabled` _(boolean)_ - Enables or disables JavaScript execution on the page. -- `viewport` \*(object)- Sets the rendering viewport dimensions. - - `width` _(number)_ - Width in pixels. - - `height` _(number)_ - Height in pixels. -- `gotoOptions` _(object)_ - Determines when the page is fully loaded. - - `waitUntil` _(string)_ - Defines the loading strategy (`domcontentloaded`, `networkidle2`). - - `timeout` _(number)_ - Timeout duration in milliseconds. -- `allowResourceTypes` _(array of strings)_ - Restricts the types of resources allowed to load. - - Example: [`document`, `script`] - Only allows HTML documents and scripts to load, preventing images, stylesheets, and other resources. - -### Response fields - -- `screenshot` _(string)_ - Base64-encoded image of the rendered page. -- `content` _(string)_ - Fully rendered HTML of the page. +Many more options exist, like setting HTTP credentials using `authenticate`, setting `cookies`, and using `gotoOptions` to control page load behaviour - check the endpoint [reference](/api/resources/browser_rendering/subresources/screenshot/methods/create/) for all available parameters.