-
Notifications
You must be signed in to change notification settings - Fork 0
fix(scraper): fix Enercoop PDF parser to extract real prices #63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The Enercoop PDF parser was using fallback values because the parsing logic didn't match the actual PDF structure. This fix: - Corrects subscription price extraction: PDF lists 36 HTT prices (index 0-35) then 36 TTC prices (index 36-71), not interleaved pairs - Corrects kWh price extraction: PDF has 4 values (HTT old, TTC old, HTT current, TTC current) - we want index 3, not index 1 - Adds proper section markers for "Flexi Watt - nuit & week-end" and "2 saisons" - Updates SEASONAL fallback prices to match current PDF (winter HP/HC values) - Renames WEEKEND offer type to HC_NUIT_WEEKEND for consistency - Now correctly extracts 33 offers from PDF instead of using fallback 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
6e2cbd6 to
171b550
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes the Enercoop PDF parser to properly extract real prices instead of relying on fallback values. The fix addresses incorrect parsing logic that didn't match the actual PDF structure, enabling extraction of all 33 offers (9 BASE, 8 HC_HP, 8 HC_NUIT_WEEKEND, 8 SEASONAL) from the current PDF.
Key changes:
- Corrected subscription price extraction logic to handle sequential HTT/TTC layout (36 HTT prices followed by 36 TTC prices)
- Fixed kWh price extraction to select the correct index (3) from 4 available values (HTT old, TTC old, HTT current, TTC current)
- Renamed offer type from WEEKEND to HC_NUIT_WEEKEND for consistency across the codebase
Reviewed changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| apps/api/src/services/price_scrapers/enercoop_scraper.py | Rewrites PDF parsing logic with proper extraction methods for all offer types, adds STANDARD_POWERS constant, renames WEEKEND to HC_NUIT_WEEKEND, updates validation to handle SEASONAL offers |
| apps/api/pyproject.toml | Adds duplicate [dependency-groups] section with problematic ruff version specification |
| apps/api/uv.lock | Adds duplicate package.dev-dependencies and package.metadata.requires-dev sections |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Extract Flexi Watt - Nuit & Week-end offers | ||
| weekend_offers = self._parse_flexi_watt_weekend_section(text, valid_from) | ||
| offers.extend(weekend_offers) | ||
|
|
||
| # Extract Flexi Watt - 2 saisons offers | ||
| seasonal_offers = self._parse_flexi_watt_seasonal_section(text, valid_from) | ||
| offers.extend(seasonal_offers) |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new offer types HC_NUIT_WEEKEND and SEASONAL introduced in this PR are not covered by tests. The test file test_enercoop_scraper.py only verifies BASE and HC_HP offers. Since the repository has comprehensive test coverage for other scrapers, tests should be added to verify:
- HC_NUIT_WEEKEND offers are correctly generated (8 power levels: 6-36 kVA)
- SEASONAL offers are correctly generated (8 power levels: 6-36 kVA)
- The new parsing functions correctly extract prices from the PDF structure
- Validation logic properly handles the new offer types
Example test structure:
# Check HC_NUIT_WEEKEND offers
weekend_offers = [o for o in offers if o.offer_type == "HC_NUIT_WEEKEND"]
assert len(weekend_offers) == 8 # 8 power levels (6-36)
# Check SEASONAL offers
seasonal_offers = [o for o in offers if o.offer_type == "SEASONAL"]
assert len(seasonal_offers) == 8 # 8 power levels (6-36)
Summary
The Enercoop PDF parser was using fallback values because the parsing logic didn't match the actual PDF structure. This fix corrects the PDF parsing to properly extract all 33 offers (9 BASE, 8 HC_HP, 8 HC_NUIT_WEEKEND, 8 SEASONAL) from the current PDF instead of relying on fallback data.
Changes
Testing
The scraper now successfully extracts pricing from the Enercoop PDF and can be tested at
/admin/offersin the web interface.