A simple FastAPI service that fetches web pages and converts them to clean markdown. The service can be run either locally or deployed to Modal's serverless platform.
- Converts web pages to clean, readable markdown
- Handles JavaScript-rendered content
- Configurable viewport settings
- Retries on failure
- Available as both local FastAPI service and Modal serverless deployment
- Secure API key authentication for Modal deployment
- Make sure you have Python 3.8+ installed
- Install dependencies:
pip install -r requirements.txt
- Install browser dependencies for Playwright:
playwright install chromium playwright install-deps chromium
- Install Modal:
pip install modal
- Set up Modal account and authenticate:
modal setup
- Configure your API key:
- Create a secret in Modal dashboard named "MYAPI" with key "APIKEY"
- In your
modalscraper.py, add the secret to your function:@app.function( image=image, secrets=[modal.Secret.from_name("MYAPI")] ) @modal.asgi_app() def fastapi_app(): return web_app
- Access the API key in your code using:
import os api_key = os.environ["APIKEY"]
Start the service locally:
python scraper.pyThe service will be available at http://localhost:8000. You can access the API documentation at http://localhost:8000/docs.
Deploy to Modal's serverless platform:
modal deploy modalscraper.pyAfter successful deployment, Modal will provide you with a URL where your service is accessible.
The service exposes a single endpoint:
GET /scrape?url=<webpage_url>
When using the Modal deployment, include your API key in the request header:
curl -H "Authorization: Bearer YOUR_API_KEY" "https://your-modal-url/scrape?url=https://example.com"For local development, authentication is disabled by default.
First, install the required packages:
pip install requests python-dotenvCreate a .env file:
API_KEY=your-modal-api-key
Then use this Python script:
import requests
import os
from dotenv import load_dotenv
# Load API key from environment
load_dotenv()
API_KEY = os.getenv("API_KEY")
# Configuration
BASE_URL = "https://example-app.modal.run"
TEST_URL = "https://example.com"
# Set up headers with authentication
headers = {
"Authorization": f"Bearer {API_KEY}"
}
# Make the request
response = requests.get(
f"{BASE_URL}/scrape",
headers=headers,
params={"url": TEST_URL}
)
# Print the markdown content
result = response.json()
print(result["markdown"])Local development:
# Using curl (bash/cmd)
curl "http://localhost:8000/scrape?url=https://example.com"
# Using PowerShell
Invoke-RestMethod -Uri "http://localhost:8000/scrape?url=https://example.com"Modal deployment:
# Using curl (bash/cmd)
curl -H "Authorization: Bearer YOUR_API_KEY" "https://your-modal-url/scrape?url=https://example.com"
# Using PowerShell
$headers = @{
"Authorization" = "Bearer YOUR-ACTUAL-API-KEY"
}
Invoke-RestMethod -Uri "YOUR-MODAL-URL/scrape?url=https://example.com" -Headers $headersThe response will be JSON containing the markdown conversion of the webpage:
{
"markdown": "# Example Domain\n\nThis domain is..."
}scraper.py- Local FastAPI service implementationmodalscraper.py- Modal serverless implementationrequirements.txt- Python dependencies
The service will return:
200 OK- Successfully scraped and converted page400 Bad Request- Failed to extract content after multiple attempts401 Unauthorized- Invalid or missing API key. This can happen if:- The Authorization header is missing
- The API key format is incorrect (should be "Bearer YOUR-API-KEY")
- The provided API key doesn't match the one configured in Modal
500 Internal Server Error- Server-side errors, including:- API key not configured on server
- Unexpected errors during processing
Feel free to open issues or submit pull requests for improvements.