Skip to content

1714 add urllib to masssuperct#1891

Open
Luis-manzur wants to merge 6 commits intomainfrom
1714-add-urllib-to-masssuperct
Open

1714 add urllib to masssuperct#1891
Luis-manzur wants to merge 6 commits intomainfrom
1714-add-urllib-to-masssuperct

Conversation

@Luis-manzur
Copy link
Copy Markdown
Contributor

  • Fix masssuperct scraper by switching from JSON API to HTML page scraping and using urllib to bypass Cloudflare TLS fingerprinting
  • Abstract the urllib download_content logic into AbstractSite._download_content_urllib() so scrapers with use_urllib = True no longer need to override download_content
    individually
  • Remove duplicated download_content overrides from lactapp_3 and masssuperct

@Luis-manzur Luis-manzur moved this to PRs to Review in Sprint (Case Law) Mar 26, 2026
Copy link
Copy Markdown
Contributor

@grossir grossir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good.

I tested the lactapp_3 and the kanctapp scrapers (affected by these changes)

This backscraper also works

@grossir grossir enabled auto-merge (squash) April 1, 2026 01:13
@grossir grossir disabled auto-merge April 1, 2026 01:57
@grossir grossir enabled auto-merge April 1, 2026 01:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: PRs to Review

Development

Successfully merging this pull request may close these issues.

2 participants