Bug: Failed to get all the existing links when have too many duplicate links

In this function, when we have one query/page’s worth of duplicate links, it will stop fetching more from the server. We assumed that when the response is empty, all links have been fetched, but if the entire response consists of duplicates, it will stop as well. This results in the remaining links being duplicated again and again.

https://github.com/rtuszik/starwarden/blob/7ccb65309364a9a1ed9f123999e7725dac225966/starwarden/linkwarden_api.py#L21-L50




	while True:
	try:
	logger.debug(f"Fetching links from cursor {cursor} for collection {collection_id}")
	response = requests.get(
	url,
	params={"collectionId": collection_id, "cursor": cursor, "sort": 1},
	headers=headers,
	timeout=30,
	)
	response.raise_for_status()
	data = response.json()
	links = data.get("response", [])
	logger.debug(f"Fetched {len(links)} links from cursor {cursor}")

	new_links = [link["url"] for link in links if link["url"] not in seen_links]
	if not new_links:
	logger.info(f"No new links found from cursor {cursor}. Stopping pagination.")
	break

	seen_links.update(new_links)
	yield from new_links
	if not links:
	break
	cursor = links[-1].get("id")

	except requests.RequestException as e:
	logger.error(f"Error fetching links from cursor {cursor}: {str(e)}")
	if hasattr(e, "response") and e.response is not None:
	logger.error(f"Response status code: {e.response.status_code}")
	logger.error(f"Response content: {e.response.text}")

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Failed to get all the existing links when have too many duplicate links #79

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Bug: Failed to get all the existing links when have too many duplicate links #79

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions