Skip to content

Conversation

@xantror
Copy link
Contributor

@xantror xantror commented Dec 28, 2025

Scraper type(s)

  • performerByName
  • performerByFragment
  • performerByURL
  • sceneByName
  • sceneByQueryFragment
  • sceneByFragment
  • sceneByURL

Short description

  • Reimplemented Clips4Sale scraper in python, update needed to deal with new fields (tags bleeding into performers, etc.)
  • Added support for scraping performers.
  • Discussion on Discord

Merge after: #2628

@xantror xantror force-pushed the xantror/update-clips4sale branch from 45d9baf to 00ff0c0 Compare December 28, 2025 14:46
@DogmaDragon

This comment was marked as resolved.

Copy link
Collaborator

@feederbox826 feederbox826 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of really odd and inconsistent style choices

text = str(text)
# Unescape HTML entities first (e.g. &lt;br&gt; -> <br>)
text = html.unescape(text)
text = re.sub(r'data-\w+="[^"]*"\s*', "", text)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why strip data attributes if you're just going to get_text()?

url = f"https://www.clips4sale.com/studio/{studio_id}/{studio_slug}?_data=routes%2F%28%24lang%29.studio.%24id_.%24studioSlug.%24"
try:
response = scraper.get(url)
if response.status_code == 200:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if response.status_code == 200:
if response.ok:

url = f"https://www.clips4sale.com/performers/{performer_id}/{encoded_slug}?_data=routes%2F%28%24lang%29.performers.%24performerId.%28%24performerSlug%29"
try:
response = scraper.get(url)
if response.status_code == 200:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if response.status_code == 200:
if response.ok:

if p_tags := p_data.get("tags"):
performer["tags"] = p_tags
else:
performer["urls"] = [performer_url]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

performer[urls] is in both conditions, why nest it?

if lower not in unique_tags:
unique_tags[lower] = original
else:
# Prefer the version with more uppercase letters
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

res["tags"] = tags
if performers:
res["performers"] = performers
return res
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why switch up nullish check style? stash handles it just fine if it's given NoneType

if details := clean_text(dig(clip, "description")):
scene["details"] = details

scene["studio"] = studio_obj
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment above on nullish typings

log.debug(f"Searching URL: {url}")
try:
response = scraper.get(url)
if response.status_code != 200:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if response.status_code != 200:
if not response.ok:


try:
response = scraper.get(data_url)
if response.status_code == 200:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if response.status_code == 200:
if response.ok:

log.debug(f"Searching Performer URL: {url}")
try:
response = scraper.get(url)
if response.status_code != 200:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if response.status_code != 200:
if not response.ok:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants