|
| 1 | +--- |
| 2 | +pcx_content_type: how-to |
| 3 | +title: Extract Markdown from a webpage |
| 4 | +sidebar: |
| 5 | + order: 10 |
| 6 | +--- |
| 7 | + |
| 8 | +The `/markdown` endpoint retrieves a webpage's content and converts it into Markdown format. You can specify a URL and optional parameters to refine the extraction process. |
| 9 | + |
| 10 | +## Basic usage |
| 11 | + |
| 12 | +### Using a URL |
| 13 | + |
| 14 | +This example fetches the Markdown representation of a webpage. |
| 15 | + |
| 16 | +```bash |
| 17 | +curl -X 'POST' 'https://api.cloudflare.com/client/v4/accounts/<accountId>/browser-rendering/markdown' \ |
| 18 | + -H 'Content-Type: application/json' \ |
| 19 | + -H 'Authorization: Bearer <apiToken>' \ |
| 20 | + -d '{ |
| 21 | + "url": "https://example.com" |
| 22 | + }' |
| 23 | +``` |
| 24 | + |
| 25 | +### JSON response |
| 26 | + |
| 27 | +```json title="json response" |
| 28 | +{ |
| 29 | + "success": true, |
| 30 | + "result": "# Example Domain\n\nThis domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission.\n\n[More information...](https://www.iana.org/domains/example)" |
| 31 | +} |
| 32 | +``` |
| 33 | + |
| 34 | +### Use raw HTML |
| 35 | + |
| 36 | +Instead of fetching the content by specifying the URL, you can provide raw HTML content directly. |
| 37 | + |
| 38 | +```bash |
| 39 | +curl -X 'POST' 'https://api.cloudflare.com/client/v4/accounts/<accountId>/browser-rendering/markdown' \ |
| 40 | + -H 'Content-Type: application/json' \ |
| 41 | + -H 'Authorization: Bearer <apiToken>' \ |
| 42 | + -d '{ |
| 43 | + "html": "<div>Hello World</div>" |
| 44 | + }' |
| 45 | +``` |
| 46 | + |
| 47 | +### JSON response |
| 48 | + |
| 49 | +```json title="json response" |
| 50 | +{ |
| 51 | + "success": true, |
| 52 | + "result": "Hello World" |
| 53 | +} |
| 54 | +``` |
| 55 | + |
| 56 | +## Advanced usage |
| 57 | + |
| 58 | +You can refine the Markdown extraction by using the `rejectRequestPattern` parameter. In this example, requests matching the given regex pattern (such as CSS files) are excluded. |
| 59 | + |
| 60 | +```bash |
| 61 | +curl -X 'POST' 'https://api.cloudflare.com/client/v4/accounts/<accountId>/browser-rendering/markdown' \ |
| 62 | + -H 'Content-Type: application/json' \ |
| 63 | + -H 'Authorization: Bearer <apiToken>' \ |
| 64 | + -d '{ |
| 65 | + "url": "https://example.com", |
| 66 | + "rejectRequestPattern": ["/^.*\\.(css)/"] |
| 67 | + }' |
| 68 | +``` |
| 69 | + |
| 70 | +### JSON response |
| 71 | + |
| 72 | +```json title="json response" |
| 73 | +{ |
| 74 | + "success": true, |
| 75 | + "result": "# Example Domain\n\nThis domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission.\n\n[More information...](https://www.iana.org/domains/example)" |
| 76 | +} |
| 77 | +``` |
| 78 | + |
| 79 | +## Potential use-cases |
| 80 | + |
| 81 | +1. **Content extraction:** Convert a blog post or article into Markdown format for storage or further processing. |
| 82 | +2. **Static site generation:** Retrieve structured Markdown content for use in static site generators like Jekyll or Hugo. |
| 83 | +3. **Automated summarization:** Extract key content from web pages while ignoring CSS, scripts, or unnecessary elements. |
0 commit comments