Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
135 changes: 135 additions & 0 deletions src/content/docs/browser-rendering/rest-api/json-endpoint.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
---
pcx_content_type: how-to
title: Capture webpage data in JSON format
sidebar:
order: 9
---

The `/json` endpoint extracts structured data from a webpage. You can specify the expected output using either a `prompt` or a `response_format` parameter which accepts a JSON schema. The endpoint returns the extracted data in JSON format.

## Parameters

| Parameter | Mandatory | Note |
| --------------- | --------- | ---------------------------------------------------------------------------- |
| url | yes | The URL of the webpage to extract data from. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Identified issues

  • Vale Style Guide - (Terms-error) Use 'URL' instead of 'url'.

Proposed fix

Suggested change
| url | yes | The URL of the webpage to extract data from. |
| URL | yes | The URL of the webpage to extract data from. |

I capitalized 'url' to 'URL' as it is not within any restricted formatting and should follow the style guide.

| prompt | no | Must supply one of `prompt` or `response_format`. |
| response_format | no | Must supply one of `prompt` or `response_format`. May include a JSON schema. |

## Basic usage

### With a prompt and JSON schema

This example captures webpage data by providing both a prompt and a JSON schema. If multiple headings exist, the first occurrence of each (e.g. `h1`, `h2`) is returned.

```bash
curl --request POST 'https://api.cloudflare.com/client/v4/accounts/CF_ACCOUNT_ID/browser-rendering/json' \
--header 'authorization: Bearer CF_API_TOKEN' \
--header 'content-type: application/json' \
--data '{
"url": "http://demoto.xyz/headings",
"prompt": "Get the heading from the page. If there are many then grab the first one.",
"response_format": {
"type": "json_schema",
"json_schema": {
"type": "object",
"properties": {
"h1": {
"type": "string"
},
"h2": {
"type": "string"
}
},
"required": [
"h1"
]
}
}
}'
```

#### JSON response

```json title="json response"
{
"success": true,
"result": {
"h1": "Heading 1",
"h2": "Heading 2"
}
}
```

### With only a prompt

In this example, only a prompt is provided. The endpoint will use the prompt to extract the heading information from the page.

```bash
curl --request POST 'https://api.cloudflare.com/client/v4/accounts/CF_ACCOUNT_ID/browser-rendering/json' \
--header 'authorization: Bearer CF_API_TOKEN' \
--header 'content-type: application/json' \
--data '{
"url": "http://demoto.xyz/headings",
"prompt": "Get the heading from the page in the form of an object like h1, h2. If there are many headings of the same kind then grab the first one."
}'
```

#### JSON response

```json title="json response"
{
"success": true,
"result": {
"h1": "Heading 1",
"h2": "Heading 2"
}
}
```

### With only a JSON schema (no prompt)

In this case, you supply a JSON schema via the `response_format` parameter. The schema defines the structure of the extracted data.

```bash
curl --request POST 'https://api.cloudflare.com/client/v4/accounts/CF_ACCOUNT_ID/browser-rendering/json' \
--header 'authorization: Bearer CF_API_TOKEN' \
--header 'content-type: application/json' \
--data '{
"url": "http://demoto.xyz/headings",
"response_format": {
"type": "json_schema",
"json_schema": {
"type": "object",
"properties": {
"h1": {
"type": "string"
},
"h2": {
"type": "string"
}
},
"required": [
"h1"
]
}
}
}'
```

#### JSON response

```json title="json response"
{
"success": true,
"result": {
"h1": "Heading 1",
"h2": "Heading 2"
}
}
```

## Potential use-cases

1. **Extract Movie Data:** Retrieve details like name, genre, and release date for the top 10 action movies from the IMDB top 250 list by supplying the appropriate IMDB link and JSON schema.
2. **Weather Information:** Fetch current weather conditions for a location (e.g., Edinburgh) using a weather website link (like from BBC Weather).
3. **Trending News:** Extract top trending posts on Hacker News by providing the Hacker News link along with a JSON schema that includes post title and body.
4 changes: 4 additions & 0 deletions src/content/release-notes/ai-gateway.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,10 @@ productLink: "/ai-gateway/"
productArea: Developer platform
productAreaLink: /workers/platform/changelog/platform/
entries:
- publish_date: "2025-03-18"
title: WebSockets
description: |-
Added [WebSockets API](/ai-gateway/configuration/websockets-api/) to provide a persistent connection for AI interactions, eliminating repeated handshakes and reducing latency.
- publish_date: "2025-02-26"
title: Guardrails
description: |-
Expand Down
Loading