feat(engie-scraper): switch from PDF to HelloWatt web scraping #67

m4dm4rtig4n · 2025-12-05T07:53:36Z

Summary

Replaces Engie PDF scraper with HelloWatt comparison site scraping to get more accurate, up-to-date pricing. Now supports 34 Engie offers (Référence 3 ans and Tranquillité in both BASE and HC/HP options). Also fixes a React crash in AdminOffers.tsx that occurred after logout due to corrupted cache data.

Changes

Engie scraper: HTML parsing with BeautifulSoup instead of PDF extraction
Added defensive array check in AdminOffers.tsx to prevent crashes
Updated fallback pricing to December 2025
Updated documentation

Testing

Visit /admin/offers and click "Prévisualiser" on the Engie provider to verify all 34 offers are detected correctly.

🤖 Generated with Claude Code

- Replace Engie PDF scraper with HelloWatt comparison site scraping - Add HTML parsing with BeautifulSoup for pricing tables - Support 34 Engie offers: Référence 3 ans + Tranquillité (BASE + HC/HP) - Update fallback pricing data to December 2025 - Fix AdminOffers.tsx: defensive array check to prevent crashes after logout 🤖 Generated with Claude Code Co-Authored-By: Claude <[email protected]>

🤖 Generated with Claude Code Co-Authored-By: Claude <[email protected]>

Change label from "Elec Référence 1 an (PDF officiel)" to "Tarifs Engie (HelloWatt)" 🤖 Generated with Claude Code Co-Authored-By: Claude <[email protected]>

Copilot

Pull request overview

This PR refactors the Engie scraper from PDF parsing to HTML web scraping via HelloWatt, increasing coverage from 17 to 34 offers. It also adds a defensive fix in AdminOffers.tsx to prevent crashes from corrupted cache data, and updates documentation to reflect the new scraping approach and pricing data as of December 2025.

Key Changes

Engie scraper refactored: Replaced PDF extraction with BeautifulSoup HTML parsing from HelloWatt comparison site
Offer coverage doubled: Now scrapes 34 offers (Référence 3 ans and Tranquillité, both in BASE and HC/HP variants) vs. previous 17 offers
Frontend crash fix: Added defensive Array.isArray() check in AdminOffers.tsx to handle corrupted cache scenarios

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 15 comments.

File	Description
`apps/api/src/services/price_scrapers/engie_scraper.py`	Complete rewrite: removed PDF parsing, added HTML scraping with BeautifulSoup, new table/header parsing methods, updated fallback prices to December 2025
`apps/web/src/pages/AdminOffers.tsx`	Added defensive array check to prevent crashes when `offersData` is corrupted or undefined
`docs/features-spec/energy-providers-scrapers.md`	Updated documentation: source changed to HelloWatt, offer count increased from 17 to 34, updated pricing mechanism details
`docs/pages/admin-offers.md`	Updated total offer count from ~236 to ~253 and data source description

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-05T08:05:55Z

docs/features-spec/energy-providers-scrapers.md

+**Prix TTC** (décembre 2025):
+- Référence 3 ans BASE: 0.2124€/kWh (3-6 kVA), 0.2109€/kWh (9+ kVA)
+- Tranquillité BASE: 0.2612€/kWh (3-6 kVA), 0.2597€/kWh (9+ kVA)
+


[nitpick] The documentation states "Prix TTC (décembre 2025)" which implies these are current prices as of December 2025. However, since the current date is December 5, 2025, this should be clarified whether the prices are from the beginning of December or expected to be updated later in December. Consider being more specific with the date (e.g., "début décembre 2025" or the actual date).

Copilot · 2025-12-05T08:05:55Z