Default Transport: The proxy uses standard HTTP by default, which is simpler and has no external dependencies. Chromedp is an optional feature for advanced use cases.
chromedp is a headless browser automation library that lets the proxy render JavaScript-heavy websites by controlling a real Chrome/Chromium browser instance instead of making raw HTTP requests.
Use chromedp when:
- ✅ Websites require JavaScript to render content
- ✅ You need to handle dynamic/SPA (Single Page Application) content
- ✅ Target sites load data via client-side JavaScript
- ✅ You need accurate visual rendering
Use standard HTTP when (default):
- ✅ Simple HTML sites (no JavaScript) - RECOMMENDED
- ✅ Performance is critical (chromedp is slower)
- ✅ Resource constraints (chromedp uses more memory)
- ✅ Quick proxying without rendering overhead
┌──────────────────────────────────────────────┐
│ Proxy Request │
│ GET http://example.com → HTTP Client │
└──────────────┬───────────────────────────────┘
│
┌─────▼──────────────┐
│ Transport Layer │
│ (http or chromedp) │
└─────┬──────────────┘
│
┌────────┴────────────────────────────┐
│ │
┌───▼─────┐ (standard HTTP) ┌──────────▼────┐
│ Standard│ ◄─────────────────► │ Chrome DevTools
│ Transport tcp/ip │ Protocol (CDP)
└───┬─────┘ └──────────┬────┘
│ │
│ ┌─────────▼─────────┐
│ │ Chrome Instance │
│ │ (Headless Shell) │
│ │ │
│ │ [Tab1] [Tab2] ... │
│ │ │
│ │ Renders page, │
│ │ executes JS, │
│ │ extracts HTML │
│ └─────────┬─────────┘
│ │
│ ┌────────────────────────────────┘
│ │
┌───▼──▼───────┐
│ Response │
│ (HTML/JSON) │
└──────────────┘
-
Allocator - Establishes WebSocket connection to Chrome DevTools Protocol
Chrome listens on: http://localhost:9222 chromedp connects to the remote debugging protocol -
Browser Context - Each request gets a temporary browser tab
ctx, cancel := chromedp.NewContext(allocCtx) defer cancel() // closes tab when done
-
Navigation & Rendering
chromedp.Navigate(url) // Go to URL chromedp.WaitReady("body") // Wait for body element chromedp.OuterHTML("html", &html) // Extract rendered HTML
-
Semaphore Pool - Limits concurrent tabs to avoid browser crashes
pool_size=5 means max 5 tabs open at once Requests queue if limit is reached
┌─────────────────────────────────────────────────────────────┐
│ Client Request: GET http://example.com/page │
└────────────────────────────────────────────────────────────┬┘
│
┌─────────────────────────────────────────────────────────────▼┐
│ Proxy receives request, checks semaphore │
│ If all 5 tabs busy: wait for one to free up │
└─────────────────────────────────────────────────────────────┬┘
│
┌─────────────────────────────────────────────────────────────▼┐
│ Acquire slot (semaphore -1) │
└─────────────────────────────────────────────────────────────┬┘
│
┌─────────────────────────────────────────────────────────────▼┐
│ chromedp.NewContext() → Open new tab in Chrome │
└─────────────────────────────────────────────────────────────┬┘
│
┌─────────────────────────────────────────────────────────────▼┐
│ chromedp.Navigate(url) → Visit the page │
│ • Chrome loads HTML │
│ • Browser parses HTML │
│ • Executes <script> tags │
│ • Loads resources (CSS, JS, images) │
│ • Fires events (DOMContentLoaded, load) │
└─────────────────────────────────────────────────────────────┬┘
│
┌─────────────────────────────────────────────────────────────▼┐
│ chromedp.WaitReady("body") → Wait until body element ready │
└─────────────────────────────────────────────────────────────┬┘
│
┌─────────────────────────────────────────────────────────────▼┐
│ chromedp.OuterHTML("html", &html) → Extract rendered HTML │
│ • Gets current DOM tree │
│ • Includes all JS modifications │
│ • CSS is applied │
└─────────────────────────────────────────────────────────────┬┘
│
┌─────────────────────────────────────────────────────────────▼┐
│ Return HTML as response │
│ (HTML-to-Markdown conversion happens next) │
└─────────────────────────────────────────────────────────────┬┘
│
┌─────────────────────────────────────────────────────────────▼┐
│ Close tab & release semaphore slot (+1) │
└─────────────────────────────────────────────────────────────┘
The proxy works out-of-the-box with standard HTTP transport (no Chrome needed):
# Docker Compose (HTTP only)
docker compose up -d
# Local binary
./markdowninthemiddleBoth start with HTTP transport and work immediately. To enable chromedp, follow options below.
To enable JavaScript rendering in Docker:
# In docker-compose.yml, uncomment the Chrome service dependency:
# depends_on:
# chrome:
# condition: service_healthy
# Set transport to chromedp:
export MITM_TRANSPORT_TYPE=chromedp
export MITM_TRANSPORT_CHROMEDP_URL=http://chrome:9222
# Start services
docker compose up -dWhat happens:
chromeservice starts with DevTools Protocol enabledproxyservice waits for Chrome health check to pass- Proxy connects to
http://chrome:9222and uses chromedp for rendering
Run Chrome locally on your machine:
macOS:
# Install Chrome if needed
brew install google-chrome
# Start Chrome with remote debugging
google-chrome --headless --disable-gpu --remote-debugging-port=9222
# In another terminal, start proxy
./markdowninthemiddle --transport chromedp
# Proxy connects to http://localhost:9222Linux:
# Install Chrome if needed
sudo apt-get install chromium-browser
# Start Chrome
chromium-browser --headless --disable-gpu --remote-debugging-port=9222
# Start proxy
./markdowninthemiddle --transport chromedpWindows:
# Start Chrome with DevTools enabled
"C:\Program Files\Google\Chrome\Application\chrome.exe" ^
--headless --disable-gpu --remote-debugging-port=9222
# Start proxy (in PowerShell)
.\markdowninthemiddle.exe --transport chromedpPoint to Chrome running elsewhere:
# Chrome running on another server at 192.168.1.100:9222
export MITM_TRANSPORT_CHROMEDP_URL=http://192.168.1.100:9222
./markdowninthemiddle --transport chromedpOr in config.yml:
transport:
type: chromedp
chromedp:
url: http://192.168.1.100:9222
pool_size: 10Using Browserless.io or similar service:
export MITM_TRANSPORT_CHROMEDP_URL=https://chrome.browserless.io
./markdowninthemiddle --transport chromedpNote: chromedp expects a WebSocket CDP endpoint. Verify your service exposes /json/version.
transport:
# Type of transport: "http" (default) or "chromedp"
type: "chromedp"
chromedp:
# Chrome DevTools Protocol URL
# Docker Compose: http://chrome:9222
# Local: http://localhost:9222
# Remote: http://other-host:9222
url: "http://localhost:9222"
# Maximum concurrent browser tabs
# Higher = more parallelism but more memory usage
# Typical values: 3-10 depending on available resources
pool_size: 5# Enable chromedp transport
export MITM_TRANSPORT_TYPE=chromedp
# Point to your Chrome instance
export MITM_TRANSPORT_CHROMEDP_URL=http://localhost:9222
# Limit concurrent tabs
export MITM_TRANSPORT_CHROMEDP_POOL_SIZE=5# Only transport type can be set via CLI
./markdowninthemiddle --transport chromedp
# Chrome URL and pool size must be in config.yml or env varsProblem: Chrome is not running or not accessible
Solutions:
-
Docker Compose:
docker compose ps # Check if chrome service is running docker compose logs chrome # View Chrome logs docker compose restart chrome # Restart Chrome
-
Local Chrome:
# Check if Chrome is listening nc -zv localhost 9222 # If not, start Chrome: google-chrome --headless --disable-gpu --remote-debugging-port=9222
-
Check URL:
# Verify Chrome URL is correct curl http://localhost:9222/json/version # Should return JSON like: {"Browser":"Chrome/...","Version":...}
Problem: Pool size too high, hitting OS limit
Solution:
# Increase OS limit (Linux/macOS)
ulimit -n 2048
# Or reduce pool size
export MITM_TRANSPORT_CHROMEDP_POOL_SIZE=3Problem: Chrome running out of shared memory (Docker)
Solution:
Docker Compose already sets shm_size: "2gb". If still failing:
docker compose down
docker system prune # Clean up
docker compose up -dProblem: Browser rendering is slow
Causes & Solutions:
-
Complex JavaScript: Normal - chromedp waits for JS to execute
- Consider using HTTP transport for simple HTML
- Increase timeout:
chromedp.timeout: 60s(in code)
-
Pool too small: Requests queuing
- Increase
pool_size(use more memory)
- Increase
-
Chrome overloaded: Many tabs competing for resources
- Reduce concurrent load
- Add more Chrome instances (run multiple Docker containers)
Important: Chromedp is optional. If Chrome isn't available:
- Use HTTP transport (default) - No Chrome needed, proxy works immediately
- Don't enable chromedp - Only set
--transport chromedpif Chrome is running - Graceful degradation - Use multiple environment configs:
# Only enable chromedp if Chrome is available
if [ "$(curl -s http://localhost:9222/json/version)" ]; then
export MITM_TRANSPORT_TYPE=chromedp
else
export MITM_TRANSPORT_TYPE=http
fi
./markdowninthemiddleOr just use HTTP transport for most use cases.
-
Use Docker Compose for production
- Orchestrates Chrome + Proxy
- Health checks ensure reliability
- Easy scaling
-
Set appropriate pool_size
Available RAM / 50MB per tab ≈ optimal pool_size Example: 512MB available → pool_size: 3-5 2GB available → pool_size: 10-15 8GB available → pool_size: 50+ -
Monitor Chrome health
docker stats markdowninthemiddle-chrome
-
Use HTTP transport as default
- Only enable chromedp for domains requiring JS
- Use filter to target specific URLs:
filter: allowed: - "^https://spa-app\\.com" # chromedp - "^https://api\\.example\\.com" # HTTP
-
Set reasonable timeouts
proxy: read_timeout: 60s # Increase for chromedp (JS rendering) write_timeout: 60s
To customize Chrome behavior, modify the docker-compose.yml:
chrome:
environment:
HEADLESS_SHELL_ARGS: "--disable-gpu --disable-dev-shm-usage --no-sandbox"For high load, run multiple Chrome instances:
chrome-1:
image: chromedp/headless-shell:latest
ports:
- "9222:9222"
chrome-2:
image: chromedp/headless-shell:latest
ports:
- "9223:9222"
proxy:
# Configure to use first Chrome, or implement load balancing
environment:
MITM_TRANSPORT_CHROMEDP_URL: http://chrome-1:9222- chromedp GitHub: https://github.com/chromedp/chromedp
- Chrome DevTools Protocol: https://chromedevtools.github.io/devtools-protocol/
- Headless Chrome: https://developers.google.com/web/updates/2017/04/headless-chrome