Skip to content

Latest commit

 

History

History
256 lines (200 loc) · 15.7 KB

File metadata and controls

256 lines (200 loc) · 15.7 KB

webseed

Automated CLI pipeline that finds Italian local businesses without websites on Google Maps, generates professional HTML sites with Claude AI, tests them locally, deploys to Vercel, creates personalized email drafts in Gmail for outreach, and tracks everything in a local TinyDB database.

Claude's role: General-purpose helper for all things webseed — implementing features, fixing bugs, improving prompts, designing architecture, testing, product strategy, and anything else that evolves around webseed as a product and codebase.

Tech Stack

  • Python ≥3.11src/ layout package, venv at .venv/, managed via pyproject.toml (.python-version pins 3.14 for dev)
  • Google Maps Places API (new v1) — business discovery + enrichment via google-maps-places SDK
  • Claude Code CLI — site generation, visual testing, HTML fixes, and email generation (via claude --print subprocess)
  • Vercel CLI — deployment (npm i -g vercel)
  • Playwright MCP — visual testing via Claude Code CLI (browser navigation, screenshots, DOM inspection)
  • Playwright (Python) — above-the-fold email screenshots only
  • TinyDB — local JSON-based state management (webseed.json)
  • Gmail API — OAuth-based draft creation with label management

Project Structure

webseed/                          (project root)
├── pyproject.toml                (project metadata, dependencies, CLI entry point)
├── .python-version               (3.14 — dev version)
├── .env / .env.example
├── CLAUDE.md
└── src/
    └── webseed/                  (Python package)
        ├── __init__.py
        ├── __main__.py           (python -m webseed entry)
        ├── claude_cli.py          (Claude Code CLI subprocess helper)
        ├── pipeline.py           (CLI entry point, orchestrates all steps)
        ├── store.py              (TinyDB data store)
        ├── maps.py               (Google Places search, enrichment, photo download)
        ├── generator.py          (Claude Code CLI HTML generation)
        ├── utils.py              (shared helpers: atomic_write)
        ├── deployer.py           (Vercel deploy under shared 'webseed' project)
        ├── tester.py             (Visual testing via Claude CLI + email screenshots)
        ├── emailer.py            (Gmail draft creation + Claude email gen)
        └── prompts/
            ├── site_gen.txt      (Italian site generation user prompt)
            ├── site_gen_system.txt (site generation system prompt)
            ├── code_review.txt   (HTML code review QA checklist)
            ├── visual_test.txt   (QA checklist for Playwright visual testing)
            ├── fix_html.txt      (HTML fix prompt)
            └── email_gen.txt     (Italian email generation prompt)

Pipeline Flow

Search (Maps) → Enrich (Place Details + Photos) → Generate (Claude) → Test (Code Review + optional Playwright) → Deploy (Vercel) → Email (Claude+Gmail Draft)

Each step is independent and resumable. State is tracked per-business in TinyDB with status progression: searchedenrichedgeneratedtesteddeployedemail_queued

Error statuses: error_enrich, error_generate, error_test, error_deploy, error_email. Special: opted_out (blacklisted). emailed is a valid reset target but not currently set by the pipeline.

Module Map

File Role
src/webseed/pipeline.py CLI entry point with subcommands (search, enrich, generate, test, deploy, email, run + management). Orchestrates all pipeline steps
src/webseed/claude_cli.py run_claude_cli() subprocess helper + extract_json_result() JSON parser + get_timeout() env-based timeout reader. Shared by generator, tester, and emailer
src/webseed/utils.py Shared utilities — atomic_write() for crash-safe file writes (used by generator and tester)
src/webseed/store.py TinyDB data store — open/upsert/query businesses, status updates, blacklist management
src/webseed/maps.py Stage 1 search (cheap discovery), Stage 2 enrichment (enrich_business()), photo download. Returns BusinessData dataclasses
src/webseed/generator.py Builds prompt from template + business data, calls Claude Code CLI, writes single-file index.html with inline CSS/JS
src/webseed/deployer.py Deploy to Vercel under a single webseed project. Each business gets a unique public deployment URL
src/webseed/tester.py Visual testing via Claude Code CLI + Playwright MCP, HTML fixes, and email screenshot capture (Python Playwright)
src/webseed/emailer.py Gmail API auth, Claude Code CLI email generation, MIME draft creation with inline screenshot
src/webseed/prompts/site_gen.txt Italian-language user prompt template for site generation
src/webseed/prompts/site_gen_system.txt System prompt for site generation
src/webseed/prompts/code_review.txt HTML code review QA checklist (text-only, no browser)
src/webseed/prompts/visual_test.txt QA checklist prompt for Playwright visual testing
src/webseed/prompts/fix_html.txt HTML fix prompt template
src/webseed/prompts/email_gen.txt Italian-language prompt template for email generation (outputs JSON)

CLI Usage

Installation

pip install -e .     # editable install from project root

Pipeline Subcommands

# 1. Search — find businesses on Maps, save to DB (Stage 1 only, cheap)
webseed search --location "Milano, Italy" --query "ristorante" --limit 5

# 2. Enrich — Place Details + photo download (place_ids required)
webseed enrich PLACE_ID "nome"         # by place_id or name
webseed enrich PLACE_ID --only-media   # skip Place Details, only download photos

# 3. Generate — create HTML sites via Claude Code CLI (place_ids required)
webseed generate PLACE_ID "nome"       # by place_id or name
webseed generate PLACE_ID --model opus # use a specific model

# 4. Test — code review + fix loop (local, no deploy needed) (place_ids required)
webseed test PLACE_ID "nome"           # by place_id or name
webseed test PLACE_ID --playwright     # also run Playwright visual test
webseed test PLACE_ID --max-fix-iterations 1    # limit fix-retest cycles (default: 3)
webseed test PLACE_ID --test-model sonnet       # model for testing (default: sonnet)

# 5. Deploy — deploy to Vercel + email screenshot (place_ids required)
webseed deploy PLACE_ID "nome"         # by place_id or name

# 6. Email — generate personalized emails, create Gmail drafts (place_ids required)
webseed email PLACE_ID "nome"          # by place_id or name
webseed email PLACE_ID --model opus    # use a specific model

# 7. Run — full pipeline (enrich → generate → test → deploy → email) for specific businesses
webseed run PLACE_ID [PLACE_ID...]     # required: one or more identifiers
webseed run "nome" --no-email          # skip email step
webseed run PLACE_ID --model opus --test-model sonnet --max-fix-iterations 1

Alternative invocation: python -m webseed <subcommand>

Management Subcommands

webseed status                              # Table of all businesses + statuses
webseed status --filter deployed             # Filter by status prefix
webseed show PLACE_ID                        # Full detail for one business
webseed stats                                # Summary counts per status
webseed blacklist-add PLACE_ID [PLACE_ID...] # Add to blacklist
webseed blacklist-remove PLACE_ID            # Remove from blacklist
webseed blacklist-list                       # Show all blacklisted
webseed reset PLACE_ID --to searched         # Reset status to re-process
webseed db-delete PLACE_ID [PLACE_ID...]     # Remove from DB only (keeps files + Vercel)
webseed db-delete --all --skip PLACE_ID      # Remove all except specified
webseed hard-delete PLACE_ID [PLACE_ID...]   # Delete DB + files + Vercel deployment
webseed hard-delete --blacklist PLACE_ID     # Same but keep entry as blacklisted
webseed hard-delete -y PLACE_ID              # Skip confirmation
webseed close PLACE_ID [PLACE_ID...]          # Blacklist + remove Vercel deploy (keep local files)
webseed close -y PLACE_ID                     # Skip confirmation
webseed export-csv --output results.csv      # Export DB to CSV

Global Flags

  • --db — TinyDB file path (default: webseed.json)
  • --results-dir — output directory (default: results/)
  • -v / --verbose — enable DEBUG logging

Environment Variables

Defined in .env (copy from .env.example):

  • GOOGLE_MAPS_API_KEY — Google Cloud, Places API (New) enabled
  • CLAUDE_CLI_PATH — (optional) path to Claude Code CLI binary; auto-detected if on PATH
  • VERCEL_CLI_PATH — (optional) path to Vercel CLI binary; auto-detected if on PATH
  • GMAIL_CREDENTIALS_FILE — path to Gmail OAuth credentials JSON (default: credentials.json)
  • GMAIL_TOKEN_FILE — (optional) path to OAuth token file (default: token.json)
  • GMAIL_LABEL_NAME — (optional) Gmail label for drafts (default: webseed-queue)
  • CONTACT_EMAILrequired for email step — email address shown in email footer for data requests
  • SENDER_NAME — (optional) sender display name in emails (default: Edoardo di WebSeed)
  • CLAUDE_TIMEOUT_GENERATE — (optional) Claude CLI timeout in seconds for generation (default: 120)
  • CLAUDE_TIMEOUT_TEST — (optional) Claude CLI timeout in seconds for testing (default: 120)
  • CLAUDE_TIMEOUT_EMAIL — (optional) Claude CLI timeout in seconds for email gen (default: 180)
  • VERCEL_PROJECT_NAME — (optional) Vercel project name for deployments (default: webseed)

Auth Notes

  • Claude Code CLI: Used for all AI steps (site generation, visual testing, HTML fixes, email generation). Handles its own auth — no API key needed
  • Gmail API: OAuth2 desktop app flow. First run of email step opens browser for consent → saves token.json. Scopes: gmail.compose, gmail.labels, gmail.modify
  • Gmail setup: GCP Console → Enable Gmail API → OAuth consent screen → Credentials → Desktop app → Download credentials.json

State Management

  • TinyDB (webseed.json): local JSON database, one document per business with all fields + status
  • Blacklist: dual — blacklist.txt (local file, one place_id per line) + DB entries with opted_out status
  • Deduplication: cross-run by place_id. Existing businesses get info updated (rating, reviews) but skip regeneration
  • Error tracking: status like error_deploy + error_detail field with message

Test Flow

  1. code_review() — Claude Code CLI analyzes index.html source code against QA checklist (text-only, no browser)
  2. (optional) visual_test() — Claude Code CLI + Playwright MCP navigates local file, takes screenshots, inspects DOM. Enabled with --playwright
  3. Fix loop (if test fails) — fix_html() sends issues + current HTML to Claude Code CLI, rewrites index.html, retests. Max iterations configurable via --max-fix-iterations (default 3)

Deploy Flow

  1. deploy() — all sites deploy under a single webseed Vercel project (no --prod). Each deployment gets a unique permanent public URL
  2. capture_email_screenshot() — 1280x600 above-the-fold screenshot for email via Python Playwright (non-fatal on failure)
  3. The public deployment URL is saved in the DB and used in outreach emails

Email Flow

  1. Claude generates personalized Italian email (subject + body_html) per business
  2. Email includes: greeting, compliment on reviews, site link, pricing (€299 + €9/mo), CTA, minimal legal footer
  3. Gmail draft created with inline above-the-fold screenshot, labeled webseed-queue
  4. User reviews drafts in Gmail and sends manually

Search Behavior

  • Stage 1 only (cheap): search discovers candidates via Nearby + Text Search on a grid. No Place Details calls — enrichment is a separate step
  • Pre-scoring: candidates ranked by _compute_pre_score() using Stage 1 fields (rating, review count, business status, category tier). Max 60 points. Filter with --min-score
  • Grid tiling: --grid-size 3 (default) divides the area into 9 cells with ~20% overlap for broader coverage
  • --limit counts only new businesses: businesses already in the DB or blacklist are skipped and don't count toward the limit
  • Duplicate places deduplicated by place_id within run; known place_ids skipped

Enrich Behavior

  • Place Details ($0.025/call): fetches phone, photos, reviews, opening hours, editorial summary, price level, payment options
  • Photo download: downloads up to 3 Google Maps photos to results/<name>/img/
  • Lead scoring: full _compute_lead_score() (0-100) on 8 signals including enrichment-only data (price level, opening hours, review recency, photos)
  • Website double-check: if Place Details reveals a website, business is flagged and skipped
  • --only-media: skip Place Details call, only download photos (useful for re-downloading)

Output

  • webseed.json — TinyDB database with all business data and pipeline state
  • results/<business_name>/index.html, vercel.json, img/ with downloaded photos
  • results/screenshots/ — smoke test screenshots + email preview screenshots

Code Conventions

  • Language: Python, snake_case functions, UPPERCASE constants
  • Package uses absolute imports (from webseed import maps, from webseed.maps import safe_name)
  • UI text and prompt templates are in Italian
  • BusinessData dataclass (defined in src/webseed/maps.py) is the shared data model across modules
  • safe_name() (public, in src/webseed/maps.py) is the shared slug function — used by generator.py, pipeline.py, and maps.py
  • Photo download falls back to Unsplash when Maps photos are unavailable; fallback_unsplash_url, photo_paths, and has_photos are stored in DB
  • All prompts are externalized in src/webseed/prompts/ as .txt files — no hardcoded prompt text in Python code
  • Prompts are loaded via _load_prompt() in pipeline.py and passed as parameters to modules
  • Generated HTML strips markdown code fences that Claude may add
  • Error handling: try/except per business in each step, failures logged but don't stop the batch
  • Pyright strict mode enabled (pyproject.toml) — all code must pass strict type checking
  • Legacy Places API field names: use photo not photos, type not types
  • Identifier resolution: most commands accept place_ids or partial business names (case-insensitive substring match via store.resolve_identifier()). Ambiguous matches prompt the user to be more specific

Testing

The test step runs locally on generated HTML (no deployment needed):

  1. Code review (default) — Claude Code CLI analyzes HTML source against QA checklist. Text-only, no browser.
  2. Playwright visual test (with --playwright) — Claude Code CLI + Playwright MCP opens local file, takes screenshots, inspects DOM, checks console errors.

Both report issues as structured JSON with severity levels (critical/major/minor). If issues are found, Claude Code CLI fixes the HTML and retests (up to --max-fix-iterations cycles, default 3).

Email screenshots (1280x600 above-the-fold) are captured during deploy step via Python Playwright.

Cost

  • Search: ~$0 (Stage 1 only, included in basic Places API quota)
  • Enrich: ~$0.025 per business (Place Details call) + negligible photo download
  • Generation: ~$0.07 per site (via Claude Code CLI)
  • Visual test: ~$0.05-0.10 per test call (Sonnet via Claude Code CLI)
  • Fix: ~$0.03-0.05 per fix call
  • Worst case per business (with 3 test-fix cycles): ~$0.43-0.73
  • With --no-test: ~$0.10 per site (enrich + generation only)
  • Email: ~$0.03 per email (via Claude Code CLI)