Bug fix: #79 and delete all existing duplicate links#82
Open
PawsFunctions wants to merge 2 commits intortuszik:mainfrom
Open
Bug fix: #79 and delete all existing duplicate links#82PawsFunctions wants to merge 2 commits intortuszik:mainfrom
PawsFunctions wants to merge 2 commits intortuszik:mainfrom
Conversation
…e response is filled with duplicate
Edit: updated get_existing_links to use new search api and changed cursor
logic to use nextCursor from response.
ADD: delete_links function to delete all duplicate links (OPT_DELETE_DUPLICATE).
ADD: DEBUG enviroment variable to replicate -d.
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- Consider separating the duplicate deletion logic from
get_existing_linksinto a dedicated function so thatget_existing_linksfocuses purely on pagination/iteration and side effects like deletions are handled explicitly by the caller. - Logging the full
duplicate_link_idslist can become very large and may expose internal IDs; it would be safer to log only the count and perhaps a small sample instead of the entire list.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider separating the duplicate deletion logic from `get_existing_links` into a dedicated function so that `get_existing_links` focuses purely on pagination/iteration and side effects like deletions are handled explicitly by the caller.
- Logging the full `duplicate_link_ids` list can become very large and may expose internal IDs; it would be safer to log only the count and perhaps a small sample instead of the entire list.
## Individual Comments
### Comment 1
<location> `starwarden/linkwarden_api.py:43-56` </location>
<code_context>
- if not links:
+ total_links_processed += len(links)
+
+ for link in links:
+ link_url = link["url"]
+ link_id = link["id"]
+
+ if link_url in seen_urls:
+ # Found a duplicate
+ logger.debug(f"Found duplicate link: {link_url} (ID: {link_id})")
+ duplicate_link_ids.append(link_id)
+ else:
+ seen_urls.add(link_url)
+
+ yield link_url
+
+ if next_cursor is None:
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Consider not yielding URLs that are detected as duplicates if they are going to be deleted.
`get_existing_links` currently yields `link_url` for every record, including those already in `seen_urls` whose `link_id` is queued in `duplicate_link_ids` for deletion. A caller expecting a stream of unique, current links will still receive these soon-to-be-deleted duplicates. If the goal (especially with `delete_duplicate=True`) is to present a deduplicated view, you could avoid yielding when `link_url in seen_urls` and only record the `link_id` for deletion. If some callers depend on the existing "yield everything" behavior, consider making that distinction explicit via a parameter or clearer naming.
```suggestion
total_links_processed += len(links)
for link in links:
link_url = link["url"]
link_id = link["id"]
if link_url in seen_urls:
# Found a duplicate
logger.debug(f"Found duplicate link: {link_url} (ID: {link_id})")
duplicate_link_ids.append(link_id)
# Do not yield duplicates since they are queued for deletion
continue
else:
seen_urls.add(link_url)
# Only yield URLs that are not marked as duplicates
yield link_url
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
Owner
|
I can't merge a PR that has failing tests. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bug fix: #79 - Failed to get all the existing links when entire response is filled with duplicates.
Edit: updated
get_existing_linksto use new search api and changed cursor logic to usenextCursorfrom the response.ADD: delete_links function to delete all duplicate links (
OPT_DELETE_DUPLICATE).ADD:
DEBUGenvironment variable to replicate-d.