Enable Playwright by default for all scrape sources by mbuckingham74 · Pull Request #101 · mbuckingham74/Far-Reach-Jobs

mbuckingham74 · 2025-12-02T02:22:38Z

Summary

Adds migration 019 to set use_playwright=True for all existing sources and change the column default
Most modern job sites use JavaScript rendering - without Playwright, the scraper misses dynamically loaded job listings
This was causing approximately 50% of scraping failures (sources configured without Playwright)

Changes

backend/alembic/versions/019_enable_playwright_by_default.py - Migration to enable Playwright globally
CLAUDE_STATUS.md - Document new default behavior
backend/app/templates/admin/scraper_guide.html - Clarify Playwright is enabled by default

Test plan

Run migration on production: migration will update all existing sources
Verify new sources default to use_playwright=True
Test scraping on a previously-failing source (e.g., Copper River Native Association)

🤖 Generated with Claude Code

Migration 019 sets use_playwright=True for all existing sources and changes the column default so new sources also use Playwright. Most modern job sites use JavaScript rendering, and without Playwright the scraper only gets the initial HTML before JS executes, missing dynamically loaded job listings. This was causing ~50% of scraping failures. Changes: - Add migration 019_enable_playwright_by_default.py - Update CLAUDE_STATUS.md with new default behavior - Update scraper guide to clarify Playwright is enabled by default 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2025-12-02T02:24:35Z

backend/alembic/versions/019_enable_playwright_by_default.py

+    op.alter_column(
+        'scrape_sources',
+        'use_playwright',
+        server_default=sa.text('1'),  # MySQL uses 1 for True


Align ORM default with Playwright migration

The migration sets a server default of TRUE for scrape_sources.use_playwright, but the SQLAlchemy model ScrapeSource.use_playwright still declares default=False (backend/app/models/scrape_source.py). Because the admin create/import paths instantiate ScrapeSource without setting this field, SQLAlchemy will send False on insert, overriding the new server default. New sources will therefore continue to have Playwright disabled despite the intent to enable it by default. Update the model (or omit the Python default) so inserts inherit the new default.

Useful? React with 👍 / 👎.

Previous commit only set DB default but ORM default was still False, and runner.py hardcoded True ignoring the database setting entirely. Fixes: - Change ORM default from False to True in scrape_source.py - Runner now reads source.use_playwright (with True fallback for NULL) - Update/add tests to verify the default behavior This ensures: 1. New sources created via admin/CSV have use_playwright=True 2. The admin toggle can actually disable Playwright for rare httpx-only cases 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Prevents the Configure Source form from silently resetting use_playwright to False on every save. The checkbox is checked by default for new sources and preserves the existing value for existing sources. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Display "Using Playwright (headless browser)" or "Using httpx (direct HTTP)" in the scrape modal while the scrape is running, so admins can confirm which fetch method is being used without checking logs. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Defensive coding to handle case where the playwright text element might not be found in the DOM. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Previously, a scrape was marked "Failed" if there were ANY errors, even if jobs were successfully found. This was too strict. Changes: - last_scrape_success is now True if jobs were found OR no errors - Auto-enable now triggers when jobs are found (ignores warnings) This fixes sources staying in "Needs Configuration" and showing "Failed" status even when they successfully scraped jobs. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Resolve conflicts by taking main's simpler Playwright indicator implementation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

chatgpt-codex-connector bot reviewed Dec 2, 2025

View reviewed changes

mbuckingham74 and others added 6 commits December 1, 2025 21:37

Add null check for Playwright text element in scrape modal

afffdfe

Defensive coding to handle case where the playwright text element might not be found in the DOM. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Merge origin/main into fix/playwright-default

7fbcd5f

Resolve conflicts by taking main's simpler Playwright indicator implementation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

mbuckingham74 merged commit ad95d9b into main Dec 2, 2025

mbuckingham74 deleted the fix/playwright-default branch December 2, 2025 03:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable Playwright by default for all scrape sources#101

Enable Playwright by default for all scrape sources#101
mbuckingham74 merged 7 commits intomainfrom
fix/playwright-default

mbuckingham74 commented Dec 2, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mbuckingham74 commented Dec 2, 2025

Summary

Changes

Test plan

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant