Skip to content

Bug: Failed to get all the existing links when have too many duplicate links #79

@PawsFunctions

Description

@PawsFunctions

In this function, when we have one query/page’s worth of duplicate links, it will stop fetching more from the server. We assumed that when the response is empty, all links have been fetched, but if the entire response consists of duplicates, it will stop as well. This results in the remaining links being duplicated again and again.

while True:
try:
logger.debug(f"Fetching links from cursor {cursor} for collection {collection_id}")
response = requests.get(
url,
params={"collectionId": collection_id, "cursor": cursor, "sort": 1},
headers=headers,
timeout=30,
)
response.raise_for_status()
data = response.json()
links = data.get("response", [])
logger.debug(f"Fetched {len(links)} links from cursor {cursor}")
new_links = [link["url"] for link in links if link["url"] not in seen_links]
if not new_links:
logger.info(f"No new links found from cursor {cursor}. Stopping pagination.")
break
seen_links.update(new_links)
yield from new_links
if not links:
break
cursor = links[-1].get("id")
except requests.RequestException as e:
logger.error(f"Error fetching links from cursor {cursor}: {str(e)}")
if hasattr(e, "response") and e.response is not None:
logger.error(f"Response status code: {e.response.status_code}")
logger.error(f"Response content: {e.response.text}")

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions