A high-performance Cloudflare Worker that uses browser rendering to extract HTML content from Single Page Applications (SPAs) and dynamic websites. Optimized with session reuse and intelligent caching for maximum speed.
Many modern web applications (React, Vue, Angular) generate their content dynamically using JavaScript. This Worker solves the problem by:
- Using a real browser to render JavaScript-heavy pages completely
- Extracting the fully rendered HTML content
- Providing fast access through intelligent session reuse and caching
- Supporting both social media crawlers and API consumers
- β‘ High Performance - Optimized session reuse and connection pooling for 3-5x faster startup
- π§ Smart Caching - Intelligent session management with automatic cleanup
- π Full Browser Rendering - Uses Puppeteer to execute JavaScript and render SPAs completely
- π¦ KV Caching - Optional HTML content caching with configurable TTL
- π‘οΈ Resource Optimization - Blocks unnecessary resources (CSS, fonts, images) for faster loading
- π§ Error Handling - Robust error handling with optimized timeouts
- π CORS Support - Can be called from frontend applications
- Cloudflare Workers account
- Browser Rendering enabled (Puppeteer binding)
- Node.js compatibility flag enabled
- Clone this repository:
git clone https://github.com/7a6163/browser-worker.git
cd browser-worker
- Install dependencies:
npm install
- Configure your
wrangler.jsonc
:
{
"name": "browser-worker",
"main": "src/index.ts",
"compatibility_date": "2025-07-05",
"compatibility_flags": ["nodejs_compat"],
"browser": {
"binding": "MYBROWSER"
}
}
- Deploy to Cloudflare Workers:
npm run deploy
The Worker uses a simple /content/{url}
endpoint to extract HTML content:
# Extract HTML content from any URL
curl "https://your-worker.your-subdomain.workers.dev/content/https://example.com"
# For URLs with special characters, URL-encode them
curl "https://your-worker.your-subdomain.workers.dev/content/https%3A%2F%2Fexample.com%2Fpath%3Fquery%3Dvalue"
The Worker returns the fully rendered HTML content with proper headers:
Content-Type: text/html;charset=UTF-8
Access-Control-Allow-Origin: *
Invalid requests return JSON error responses:
{
"success": false,
"error": "Invalid URL format",
"url": "invalid-url"
}
# Start development server with remote browser support
npm run dev
# Or use wrangler directly
wrangler dev --remote
This Worker includes several performance optimizations:
- Session Reuse: Browser sessions are kept alive and reused across requests
- Connection Pooling: Maintains up to 5 concurrent sessions for optimal performance
- Resource Blocking: Automatically blocks CSS, fonts, and images for faster loading
- Optimized Timeouts: Reduced wait times while maintaining reliability
- Intelligent Caching: Sessions are cached for 30 minutes with automatic cleanup
For social media platforms that need to crawl your SPA:
# Facebook, LINE, Twitter, etc. can access:
https://your-worker.your-subdomain.workers.dev/content/https://your-spa.com/article/123
Integrate with your applications:
// Fetch rendered HTML content
const response = await fetch('https://your-worker.workers.dev/content/https://example.com');
const htmlContent = await response.text();
// Use the HTML content in your application
document.getElementById('content').innerHTML = htmlContent;
Use in automation workflows:
# Get rendered content for processing
curl "https://your-worker.workers.dev/content/https://news-site.com/article/123" \
| grep -o '<meta property="og:title" content="[^"]*"' \
| sed 's/.*content="\([^"]*\)".*/\1/'
# Test HTML content extraction
curl "http://localhost:8787/content/https://github.com"
# Test with complex URLs
curl "http://localhost:8787/content/https://example.com/path?query=value"
# Test error handling
curl "http://localhost:8787/content/invalid-url"
# Test CORS preflight
curl -X OPTIONS "http://localhost:8787/content/https://github.com"
# Test your deployed Worker
curl "https://your-worker.your-subdomain.workers.dev/content/https://github.com"
# Test with URL encoding
curl "https://your-worker.your-subdomain.workers.dev/content/https%3A%2F%2Fexample.com%2Fpath%3Fquery%3Dvalue"
# Test session reuse (run multiple times to see performance improvement)
time curl "https://your-worker.workers.dev/content/https://example.com"
time curl "https://your-worker.workers.dev/content/https://example.com"
time curl "https://your-worker.workers.dev/content/https://example.com"
Returns the fully rendered HTML content:
HTTP/1.1 200 OK
Content-Type: text/html;charset=UTF-8
Access-Control-Allow-Origin: *
<!DOCTYPE html>
<html>
<head>
<meta property="og:title" content="Page Title">
<meta property="og:description" content="Page Description">
<!-- All dynamically generated content -->
</head>
<body>
<!-- Fully rendered page content -->
</body>
</html>
HTTP/1.1 400 Bad Request
Content-Type: application/json
Access-Control-Allow-Origin: *
{
"success": false,
"error": "Invalid URL format",
"url": "invalid-url"
}
Click here if you are not redirected automatically
```{
"success": true,
"url": "https://example.com",
"sessionInfo": "Connected to session-id",
"data": {
"title": "Page Title",
"description": "Page Description",
"image": "https://example.com/image.jpg",
"url": "https://example.com",
"type": "website",
"siteName": "Site Name",
"locale": "en_US",
"twitterCard": "summary_large_image",
"twitterImage": "https://example.com/twitter-image.jpg",
"twitterTitle": "Twitter Title",
"twitterDescription": "Twitter Description"
}
}
No environment variables are required. The Worker uses Cloudflare's Browser Rendering binding.
- Page load timeout: 10 seconds
- Browser session reuse for better performance
- Automatic session cleanup
- HTML responses are cached for 5 minutes
- Browser sessions are reused across requests
- Efficient resource management
-
"Browser Rendering is not supported locally"
- Use
wrangler dev --remote
instead ofwrangler dev
- Use
-
"Failed to load page: 4xx/5xx"
- Check if the target URL is accessible
- Verify the URL format is correct
-
"Evaluation failed: ReferenceError"
- This usually indicates a JavaScript execution error
- Check the browser console for more details
Use ?format=json
to get detailed error information:
curl "https://your-worker.your-subdomain.workers.dev/?url=https://problematic-site.com&format=json"
- Cold Start: ~2-3 seconds for new browser sessions
- Warm Requests: ~500ms-1s when reusing sessions
- Memory Usage: Optimized with automatic session cleanup
- Concurrent Requests: Handles multiple requests efficiently
- Input URL validation and normalization
- Timeout protection against slow-loading pages
- Automatic browser session cleanup
- No sensitive data storage
This project is licensed under the MIT License.
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
For issues and questions:
- Check the troubleshooting section
- Review Cloudflare Workers documentation
- Open an issue in this repository
Version: 1.0.0 Last Updated: 2025-07-05 Cloudflare Workers: Compatible Browser Rendering: Required