-
-
Notifications
You must be signed in to change notification settings - Fork 495
feat(Clips4Sale): reimplement Clips4Sale scraper in python #2631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: xantror <[email protected]>
Signed-off-by: xantror <[email protected]>
…base64 options Signed-off-by: xantror <[email protected]>
Signed-off-by: xantror <[email protected]>
45d9baf to
00ff0c0
Compare
This comment was marked as resolved.
This comment was marked as resolved.
feederbox826
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of really odd and inconsistent style choices
| text = str(text) | ||
| # Unescape HTML entities first (e.g. <br> -> <br>) | ||
| text = html.unescape(text) | ||
| text = re.sub(r'data-\w+="[^"]*"\s*', "", text) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why strip data attributes if you're just going to get_text()?
| url = f"https://www.clips4sale.com/studio/{studio_id}/{studio_slug}?_data=routes%2F%28%24lang%29.studio.%24id_.%24studioSlug.%24" | ||
| try: | ||
| response = scraper.get(url) | ||
| if response.status_code == 200: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if response.status_code == 200: | |
| if response.ok: |
| url = f"https://www.clips4sale.com/performers/{performer_id}/{encoded_slug}?_data=routes%2F%28%24lang%29.performers.%24performerId.%28%24performerSlug%29" | ||
| try: | ||
| response = scraper.get(url) | ||
| if response.status_code == 200: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if response.status_code == 200: | |
| if response.ok: |
| if p_tags := p_data.get("tags"): | ||
| performer["tags"] = p_tags | ||
| else: | ||
| performer["urls"] = [performer_url] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
performer[urls] is in both conditions, why nest it?
| if lower not in unique_tags: | ||
| unique_tags[lower] = original | ||
| else: | ||
| # Prefer the version with more uppercase letters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
| res["tags"] = tags | ||
| if performers: | ||
| res["performers"] = performers | ||
| return res |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why switch up nullish check style? stash handles it just fine if it's given NoneType
| if details := clean_text(dig(clip, "description")): | ||
| scene["details"] = details | ||
|
|
||
| scene["studio"] = studio_obj |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comment above on nullish typings
| log.debug(f"Searching URL: {url}") | ||
| try: | ||
| response = scraper.get(url) | ||
| if response.status_code != 200: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if response.status_code != 200: | |
| if not response.ok: |
|
|
||
| try: | ||
| response = scraper.get(data_url) | ||
| if response.status_code == 200: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if response.status_code == 200: | |
| if response.ok: |
| log.debug(f"Searching Performer URL: {url}") | ||
| try: | ||
| response = scraper.get(url) | ||
| if response.status_code != 200: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if response.status_code != 200: | |
| if not response.ok: |
Scraper type(s)
Short description
Merge after: #2628