Enable Playwright by default for all scrape sources#101
Conversation
Migration 019 sets use_playwright=True for all existing sources and changes the column default so new sources also use Playwright. Most modern job sites use JavaScript rendering, and without Playwright the scraper only gets the initial HTML before JS executes, missing dynamically loaded job listings. This was causing ~50% of scraping failures. Changes: - Add migration 019_enable_playwright_by_default.py - Update CLAUDE_STATUS.md with new default behavior - Update scraper guide to clarify Playwright is enabled by default 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| op.alter_column( | ||
| 'scrape_sources', | ||
| 'use_playwright', | ||
| server_default=sa.text('1'), # MySQL uses 1 for True |
There was a problem hiding this comment.
Align ORM default with Playwright migration
The migration sets a server default of TRUE for scrape_sources.use_playwright, but the SQLAlchemy model ScrapeSource.use_playwright still declares default=False (backend/app/models/scrape_source.py). Because the admin create/import paths instantiate ScrapeSource without setting this field, SQLAlchemy will send False on insert, overriding the new server default. New sources will therefore continue to have Playwright disabled despite the intent to enable it by default. Update the model (or omit the Python default) so inserts inherit the new default.
Useful? React with 👍 / 👎.
Previous commit only set DB default but ORM default was still False, and runner.py hardcoded True ignoring the database setting entirely. Fixes: - Change ORM default from False to True in scrape_source.py - Runner now reads source.use_playwright (with True fallback for NULL) - Update/add tests to verify the default behavior This ensures: 1. New sources created via admin/CSV have use_playwright=True 2. The admin toggle can actually disable Playwright for rare httpx-only cases 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Prevents the Configure Source form from silently resetting use_playwright to False on every save. The checkbox is checked by default for new sources and preserves the existing value for existing sources. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Display "Using Playwright (headless browser)" or "Using httpx (direct HTTP)" in the scrape modal while the scrape is running, so admins can confirm which fetch method is being used without checking logs. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Defensive coding to handle case where the playwright text element might not be found in the DOM. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Previously, a scrape was marked "Failed" if there were ANY errors, even if jobs were successfully found. This was too strict. Changes: - last_scrape_success is now True if jobs were found OR no errors - Auto-enable now triggers when jobs are found (ignores warnings) This fixes sources staying in "Needs Configuration" and showing "Failed" status even when they successfully scraped jobs. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Resolve conflicts by taking main's simpler Playwright indicator implementation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Summary
use_playwright=Truefor all existing sources and change the column defaultChanges
backend/alembic/versions/019_enable_playwright_by_default.py- Migration to enable Playwright globallyCLAUDE_STATUS.md- Document new default behaviorbackend/app/templates/admin/scraper_guide.html- Clarify Playwright is enabled by defaultTest plan
use_playwright=True🤖 Generated with Claude Code