Skip to content

Commit 3c2e112

Browse files
initial rest API guide
1 parent 470bb33 commit 3c2e112

File tree

2 files changed

+184
-0
lines changed

2 files changed

+184
-0
lines changed

src/content/docs/browser-rendering/get-started.mdx

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,3 +9,9 @@ Browser rendering can be used in two ways:
99

1010
- [Workers Binding API](/browser-rendering/workers-binding-api) for complex scripts.
1111
- [REST API](/browser-rendering/rest-api/) for simple actions.
12+
13+
## Examples
14+
15+
- [Workers Binding API](/browser-rendering/how-to/ai/): Fetch [https://labs.apnic.net/](https://labs.apnic.net/) and apply a machine-learning model via Workers AI to extract the first post as JSON according to your schema.
16+
17+
- [REST API](/browser-rendering/how-to/markdown-extraction/): Render and extract the complete JSON output from the [`/markdown` endpoint](/browser-rendering/rest-api/markdown-endpoint) by processing the blog post [Introducing AutoRAG on Cloudflare](https://blog.cloudflare.com/introducing-autorag-on-cloudflare/).
Lines changed: 178 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,178 @@
1+
---
2+
title: Extracting blog post content as markdown using the markdown endpoint
3+
sidebar:
4+
order: 4
5+
---
6+
7+
This guide shows you how to capture the complete JSON output from Cloudflare's [`/markdown` API endpoint](/browser-rendering/rest-api/markdown-endpoint/).
8+
9+
We are extracting the content of a blog post from the Cloudflare Blog: [Introducing AutoRAG on Cloudflare](https://blog.cloudflare.com/introducing-autorag-on-cloudflare/)
10+
11+
## Prerequisites
12+
13+
1. Cloudflare Account and API Token.
14+
15+
- [Create a token](/fundamentals/api/get-started/create-token/) with **Browser Rendering: Edit** permissions.
16+
- You can do this under **My Profile → API Tokens → Create Token** on your [Cloudflare dashboard](https://dash.cloudflare.com/).
17+
- Note your **Account ID** (from the dashboard homepage) and **API Token**.
18+
19+
2. Command-line tools installed.
20+
21+
- cURL: a command-line tool for sending HTTP requests.
22+
- macOS/Linux: usually preinstalled.
23+
- Windows: available via WSL, Git Bash, or native Windows builds.
24+
25+
## 1: Configure your environment variables
26+
27+
Save your sensitive information into environment variables to avoid hardcoding credentials.
28+
29+
```bash
30+
export CF_ACCOUNT_ID="your-cloudflare-account-id"
31+
export CF_API_TOKEN="your-api-token-with-edit-permissions"
32+
```
33+
34+
## 2: Make the API Request and save the raw JSON
35+
36+
Run this command to fetch the markdown representation of the AutoRAG blog post and store it into a local JSON file:
37+
38+
```bash
39+
curl -s -X POST \
40+
"https://api.cloudflare.com/client/v4/accounts/${CF_ACCOUNT_ID}/browser-rendering/markdown" \
41+
-H "Content-Type: application/json" \
42+
-H "Authorization: Bearer ${CF_API_TOKEN}" \
43+
-d '{
44+
"url": "https://blog.cloudflare.com/introducing-autorag-on-cloudflare/"
45+
}' \
46+
> autorag-full-response.json
47+
```
48+
49+
The `>` parameter redirects output into a file (`autorag-full-response.json`).
50+
51+
## 3: Inspect the saved JSON
52+
53+
You can check the start of the saved JSON file to ensure it looks right:
54+
55+
```bash
56+
head -n 20 autorag-full-response.json
57+
```
58+
59+
```json output
60+
{
61+
"success": true,
62+
"errors": [],
63+
"messages": [],
64+
"result": "# "[Get Started Free](https://dash.cloudflare.com/sign-up)|[Contact Sales](https://www.cloudflare.com/plans/enterprise/contact/)\n\n[![The Cloudflare Blog](https://cf-assets.www.cloudflare ..."
65+
}
66+
```
67+
68+
## 4: (Optional) Skip unwanted resources
69+
70+
To ignore unnecessary assets like CSS, JavaScript, or images when fetching the page add `rejectRequestPattern` parameter:
71+
72+
```bash
73+
curl -s -X POST \
74+
"https://api.cloudflare.com/client/v4/accounts/${CF_ACCOUNT_ID}/browser-rendering/markdown" \
75+
-H "Content-Type: application/json" \
76+
-H "Authorization: Bearer ${CF_API_TOKEN}" \
77+
-d '{
78+
"url": "https://blog.cloudflare.com/introducing-autorag-on-cloudflare/",
79+
"rejectRequestPattern": [
80+
"/^.*\\.(css|js|png|svg)$/"
81+
]
82+
}' \
83+
> autorag-no-assets.json
84+
```
85+
86+
## 5: Extracting and saving the markdown from the JSON file
87+
88+
After saving the full response, below is how to how to extract just the Markdown.
89+
90+
The script does the following:
91+
92+
1. Reads the full JSON response from `autorag-full-response.json`
93+
2. Extracts the Markdown string from the `"result"` field
94+
3. Writes that Markdown to `autorag-blog.md`
95+
96+
```py
97+
#!/usr/bin/env python3
98+
"""
99+
extract_markdown.py
100+
101+
Reads the full JSON response from Cloudflare's Markdown endpoint
102+
and writes the 'result' field (the converted Markdown) to a .md file.
103+
"""
104+
105+
import json
106+
import sys
107+
from pathlib import Path
108+
109+
# Input and output file paths
110+
INPUT_JSON = Path("autorag-full-response.json")
111+
OUTPUT_MD = Path("autorag-blog.md")
112+
113+
def main():
114+
# Check that the input file exists
115+
if not INPUT_JSON.is_file():
116+
print(f"Error: Input file '{INPUT_JSON}' not found.", file=sys.stderr)
117+
sys.exit(1)
118+
119+
# Load the JSON response
120+
try:
121+
with INPUT_JSON.open("r", encoding="utf-8") as f:
122+
data = json.load(f)
123+
except json.JSONDecodeError as e:
124+
print(f"Error: Failed to parse JSON in '{INPUT_JSON}': {e}", file=sys.stderr)
125+
sys.exit(1)
126+
127+
# Validate structure
128+
if not data.get("success", False):
129+
print("Error: API reported failure.", file=sys.stderr)
130+
errors = data.get("errors") or data.get("messages")
131+
if errors:
132+
print("Details:", errors, file=sys.stderr)
133+
sys.exit(1)
134+
135+
if "result" not in data:
136+
print("Error: 'result' field not found in JSON.", file=sys.stderr)
137+
sys.exit(1)
138+
139+
# Extract and write the Markdown
140+
markdown_content = data["result"]
141+
try:
142+
with OUTPUT_MD.open("w", encoding="utf-8") as md_file:
143+
md_file.write(markdown_content)
144+
except IOError as e:
145+
print(f"Error: Could not write to '{OUTPUT_MD}': {e}", file=sys.stderr)
146+
sys.exit(1)
147+
148+
print(f"Success: Markdown content written to '{OUTPUT_MD}'.")
149+
150+
if __name__ == "__main__":
151+
main()
152+
```
153+
154+
### Usage
155+
156+
1. Ensure you have run the `curl` command to produce `autorag-full-response.json`.
157+
158+
2. Place `extract_markdown.py` in the same directory.
159+
160+
3. Run:
161+
162+
```
163+
python3 extract_markdown.py
164+
```
165+
166+
After execution, `autorag-blog.md` will contain the extracted Markdown.
167+
168+
## Final folder structure
169+
170+
After following these steps, your working folder will look like:
171+
172+
```
173+
.
174+
├── autorag-full-response.json # Full API response
175+
├── autorag-no-assets.json # Full API response without extra assets (optional)
176+
├── autorag-blog.md # Extracted Markdown content
177+
└── extract_markdown.py # Python extraction script (optional)
178+
```

0 commit comments

Comments
 (0)