This repo mirrors a public website into site/ and deploys that static output on Cloudflare Pages (no app runtime).
- Overview
- Quick Start
- Available Tools
- Mirror a Site
- Testing Commands
- Serve Locally
- Deployment
- Troubleshooting
A static snapshot of upstream HTML/CSS/JS/assets. It is intended for simple, cacheable content without server-side logic.
- Multiple Mirror Methods: wget (fast), Playwright (JS-aware), Crawlee (advanced)
- Automated Validation: Pre-deploy checks for structure, security, and assets
- E2E Testing: Playwright-based testing for deployed sites
- Visual Regression: Screenshot comparison testing
- One-Command Pipeline: Mirror → Fix → Validate → Deploy → Test
- SPAs/SSR apps may not mirror correctly (client-side routes, API calls, auth)
- Personalized/dynamic content will not be captured reliably
- Absolute links and canonical URLs may still point to the origin
- You must have rights to mirror and rehost the content
# Check required tools
which wget node npx git
# Install Node.js dependencies
npm install
# Install Playwright browsers (for testing and Playwright mirror)
npx playwright install chromium
# Set up Cloudflare credentials (for deployment)
./scripts/setup-credentials.sh
# OR manually:
export CLOUDFLARE_API_TOKEN="your-token"
export CLOUDFLARE_ACCOUNT_ID="your-account-id"# Run everything: mirror, fix, validate, deploy, test
./scripts/mirror-deploy-test.sh# 1. Mirror the site
./scripts/mirror-avir.sh
# 2. Fix assets
python3 scripts/fix-all-images.py
# 3. Validate
./scripts/validate-site.sh
# 4. Test locally
./scripts/serve.sh
# 5. Deploy
./scripts/commit-and-push.sh| Tool | Script | Best For | Speed |
|---|---|---|---|
| wget | mirror-avir.sh |
Static sites | Fast |
| Playwright | mirror-playwright.js |
JS-heavy sites | Medium |
| Crawlee | crawler-enhanced.js |
Advanced crawling | Medium |
| Manager | mirror-manager.js |
Orchestration | - |
| Tool | Script | Purpose |
|---|---|---|
| Site Validator | validate-site.sh |
Structure, links, HTML |
| Security Scanner | validate-security.sh |
Secrets, credentials |
| Asset Verifier | verify-assets.sh |
Images, files |
| Link Checker | check-links-enhanced.js |
Broken links |
| CSS Validator | validate-css.js |
Style comparison |
| Visual Validator | validate-visual.js |
Screenshot comparison |
| Tool | Script | Purpose |
|---|---|---|
| Image Fixer | fix-all-images.py |
Repair image paths |
| CDN Fixer | fix-cdn-assets.py |
Fix CDN references |
| Asset Downloader | download-webflow-assets.js |
Download missing assets |
| HTML Repair | repair-html-heads.py |
Fix HTML structure |
| Canonical Tags | add-canonical-tags.js |
Add SEO tags |
| Tool | Script | Purpose |
|---|---|---|
| E2E Tests | e2e/tests/*.spec.js |
End-to-end testing |
| Smoke Test | smoke.sh |
Quick health check |
| Test Pipeline | test-pipeline.sh |
Full test suite |
| Visual Tests | visual-tests.js |
Visual regression |
| Functional Tests | functional-tests.js |
Feature testing |
| Tool | Script | Purpose |
|---|---|---|
| Deploy | deploy-to-cloudflare.sh |
Deploy to Pages |
| Commit & Push | commit-and-push.sh |
Git + deploy |
| Rollback | rollback.sh |
Rollback deployments |
| History | deployment-history.js |
Track deployments |
| Verify | verify-deployment.js |
Post-deploy checks |
| Tool | Script | Purpose |
|---|---|---|
| Serve | serve.sh |
Local server |
| Dashboard | generate-dashboard.js |
Status dashboard |
| Benchmark | benchmark-performance.js |
Performance testing |
./scripts/mirror-avir.shThis mirrors https://www.avir.com with robust retry logic and progress reporting.
./scripts/mirror.sh https://example.comOptional flags:
./scripts/mirror.sh --clean https://example.com
./scripts/mirror.sh --extra-domains cdn.example.com,images.examplecdn.com https://example.com# Full mirror with browser rendering
node scripts/mirror-playwright.js
# Dry run to preview crawl plan
node scripts/mirror-playwright.js --dry-run
# Use specific browser
node scripts/mirror-playwright.js --browser firefox
# Limit pages for testing
node scripts/mirror-playwright.js --limit 5Output goes to site/.
See docs/MIRRORING.md for detailed mirroring documentation.
# Run all E2E tests
cd e2e && npx playwright test
# Run specific test file
npx playwright test tests/basic.spec.js
npx playwright test tests/links.spec.js
npx playwright test tests/visual.spec.js
# Run with debug mode
npx playwright test --debug
# Run with visible browser
npx playwright test --headed
# Update visual baselines
npx playwright test --update-snapshots
# Run with specific URL
DEPLOY_URL=https://your-site.pages.dev npx playwright test# Site structure validation
./scripts/validate-site.sh
# Security validation
./scripts/validate-security.sh
# Asset verification
./scripts/verify-assets.sh
# Comprehensive validation
node scripts/comprehensive-validation.js
# Enhanced link checking
node scripts/check-links-enhanced.js# Generate visual comparison
node scripts/validate-visual.js
# Run visual tests
node scripts/visual-tests.js
# Capture baseline screenshots
node scripts/capture-baseline.js# Benchmark site performance
node scripts/benchmark-performance.js
# Compare with production
node scripts/compare-production.js# Quick health check
./scripts/smoke.sh
# Check specific endpoint
curl -I http://localhost:8788./scripts/serve.shThis starts a local server on http://localhost:8788
Smoke test:
./scripts/smoke.shPlaywright's install-deps helper only supports apt-get, dnf, and yum, so it fails on distributions such as Arch Linux. Instead run:
sudo ./scripts/install-playwright-deps.shThe script detects the available package manager (apt-get, dnf, yum, or pacman) and installs the libraries Chromium needs.
- Production branch:
main - Build command:
exit 0 - Publish directory:
site
When you run ./scripts/commit-and-push.sh, the following validation stages execute:
| Stage | Script | Purpose | Failure Behavior |
|---|---|---|---|
| 1 | Built-in | Directory validation | Blocking |
| 2 | validate-site.sh |
Site structure | Blocking |
| 3 | verify-assets.sh |
Asset verification | Warning |
| 4 | validate-security.sh |
Security scan | Blocking |
| 5 | Built-in | Sensitive files | Blocking |
| 6 | Built-in | Large files (>10MB) | Warning |
# Full validation + commit + push
./scripts/commit-and-push.sh
# Or manual deployment
wrangler pages deploy site --project-name=avirwebtest --branch=mainRun the complete pipeline with a single command:
./scripts/mirror-deploy-test.shThis orchestrates all stages:
- Mirror - Downloads the AVIR website
- Fix - Repairs image paths and references
- Validate - Runs site structure and security checks
- Deploy - Pushes to Cloudflare Pages
- Test - Runs E2E Playwright tests
Pipeline Results:
- Console output with color-coded status
- Log file:
logs/mirror-deploy-YYYYMMDD-HHMMSS.log - JSON report:
test-results/unified-report.json - HTML report:
test-results/unified-report.html
See docs/DEPLOYMENT.md for detailed deployment documentation.
| Issue | Solution |
|---|---|
| Mirror fails with SSL errors | Script uses --no-check-certificate by default |
| Images not loading | Run python3 scripts/fix-all-images.py |
| Validation warnings | Check logs/validation-report-*.txt for details |
| Deployment fails | Ensure wrangler is authenticated: wrangler login |
| E2E tests timeout | Check deploy URL is accessible |
| Playwright not found | Run npx playwright install chromium |
| Permission denied | Run chmod +x scripts/*.sh |
# Check site structure
./scripts/validate-site.sh
# Verify all assets
./scripts/verify-assets.sh
# Run security scan
./scripts/validate-security.sh
# Check deployment history
node scripts/deployment-history.js
# View logs
tail -100 logs/mirror-deploy-*.log
# Check system state
node --version
npm --version
wrangler --version- Check the log files in
logs/directory - Review docs/TROUBLESHOOTING.md
- Check docs/DEPLOYMENT.md for deployment issues
- Review docs/MIRRORING.md for mirror issues
# Run everything - mirror, fix, validate, deploy, test
./scripts/mirror-deploy-test.sh# Step 1: Mirror the site
./scripts/mirror-avir.sh
# Step 2: Fix image paths
python3 scripts/fix-all-images.py
# Step 3: Validate
./scripts/validate-site.sh
./scripts/validate-security.sh
# Step 4: Deploy
./scripts/commit-and-push.sh# Mirror and serve locally (no deploy)
./scripts/mirror-avir.sh
./scripts/serve.sh
# In another terminal
./scripts/smoke.sh# Clean and re-mirror
rm -rf site/
./scripts/mirror-avir.sh
python3 scripts/fix-all-images.py
./scripts/validate-site.sh
./scripts/commit-and-push.sh# 1. Deploy first
./scripts/mirror-deploy-test.sh
# 2. Add domain in Cloudflare Pages dashboard
# 3. Update DNS records as instructed
# 4. Create _redirects file if needed
echo "/old-path /new-path 301" > site/_redirects# Mirror a JavaScript-heavy site
node scripts/mirror-playwright.js
# Fix assets
python3 scripts/fix-all-images.py
# Validate
./scripts/validate-site.sh
# Deploy
./scripts/commit-and-push.sh# Mirror and serve locally
./scripts/mirror-avir.sh
./scripts/serve.sh &
# Run E2E tests against local server
DEPLOY_URL=http://localhost:8788 npx playwright test
# If tests pass, deploy
./scripts/commit-and-push.sh- docs/MIRRORING.md - Detailed mirroring documentation
- docs/DEPLOYMENT.md - Deployment guide
- docs/TROUBLESHOOTING.md - Troubleshooting guide
- docs/RUNBOOK.md - Operational runbook
- SECURITY.md - Security information
Attach the domain in Pages first, then update DNS per Cloudflare instructions (CNAME to *.pages.dev for subdomains; apex usually requires Cloudflare nameservers).
Place in the publish root:
site/_redirectssite/_headers
Note: _redirects/_headers apply to static assets only, not Pages Functions.
Pros:
- Deterministic deploys (Pages publishes what you reviewed)
- No dependency on upstream during deploy
- Easy rollback with
git revert - Integrated validation pipeline prevents bad deploys
Cons:
- Repo history grows quickly
- Large binary assets can bloat the repo
Recommendation: Commit site/ for predictable deployments. If repo growth becomes a problem, move mirrors to a separate repo or generate in CI.
