Bug fix: #79 and delete all existing duplicate links by PawsFunctions · Pull Request #82 · rtuszik/starwarden

PawsFunctions · 2026-01-13T10:01:24Z

Bug fix: #79 - Failed to get all the existing links when entire response is filled with duplicates.

Edit: updated get_existing_links to use new search api and changed cursor logic to use nextCursor from the response.
ADD: delete_links function to delete all duplicate links (OPT_DELETE_DUPLICATE).
ADD: DEBUG environment variable to replicate -d.

…e response is filled with duplicate Edit: updated get_existing_links to use new search api and changed cursor logic to use nextCursor from response. ADD: delete_links function to delete all duplicate links (OPT_DELETE_DUPLICATE). ADD: DEBUG enviroment variable to replicate -d.

sourcery-ai

Hey - I've found 1 issue, and left some high level feedback:

Consider separating the duplicate deletion logic from get_existing_links into a dedicated function so that get_existing_links focuses purely on pagination/iteration and side effects like deletions are handled explicitly by the caller.
Logging the full duplicate_link_ids list can become very large and may expose internal IDs; it would be safer to log only the count and perhaps a small sample instead of the entire list.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- Consider separating the duplicate deletion logic from `get_existing_links` into a dedicated function so that `get_existing_links` focuses purely on pagination/iteration and side effects like deletions are handled explicitly by the caller.
- Logging the full `duplicate_link_ids` list can become very large and may expose internal IDs; it would be safer to log only the count and perhaps a small sample instead of the entire list.

## Individual Comments

### Comment 1
<location> `starwarden/linkwarden_api.py:43-56` </location>
<code_context>
-            if not links:
+            total_links_processed += len(links)
+            
+            for link in links:
+                link_url = link["url"]
+                link_id = link["id"]
+                
+                if link_url in seen_urls:
+                    # Found a duplicate
+                    logger.debug(f"Found duplicate link: {link_url} (ID: {link_id})")
+                    duplicate_link_ids.append(link_id)
+                else:
+                    seen_urls.add(link_url)
+                
+                yield link_url
+
+            if next_cursor is None:
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Consider not yielding URLs that are detected as duplicates if they are going to be deleted.

`get_existing_links` currently yields `link_url` for every record, including those already in `seen_urls` whose `link_id` is queued in `duplicate_link_ids` for deletion. A caller expecting a stream of unique, current links will still receive these soon-to-be-deleted duplicates. If the goal (especially with `delete_duplicate=True`) is to present a deduplicated view, you could avoid yielding when `link_url in seen_urls` and only record the `link_id` for deletion. If some callers depend on the existing "yield everything" behavior, consider making that distinction explicit via a parameter or clearer naming.

```suggestion
            total_links_processed += len(links)

            for link in links:
                link_url = link["url"]
                link_id = link["id"]

                if link_url in seen_urls:
                    # Found a duplicate
                    logger.debug(f"Found duplicate link: {link_url} (ID: {link_id})")
                    duplicate_link_ids.append(link_id)
                    # Do not yield duplicates since they are queued for deletion
                    continue
                else:
                    seen_urls.add(link_url)
                    # Only yield URLs that are not marked as duplicates
                    yield link_url
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

starwarden/linkwarden_api.py

Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>

rtuszik · 2026-01-31T10:13:55Z

I can't merge a PR that has failing tests.

sourcery-ai bot reviewed Jan 13, 2026

View reviewed changes

starwarden/linkwarden_api.py Outdated Show resolved Hide resolved

Update starwarden/linkwarden_api.py

5b1402b

Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug fix: #79 and delete all existing duplicate links#82

Bug fix: #79 and delete all existing duplicate links#82
PawsFunctions wants to merge 2 commits intortuszik:mainfrom
PawsFunctions:main

PawsFunctions commented Jan 13, 2026

Uh oh!

sourcery-ai bot left a comment

Uh oh!

Uh oh!

rtuszik commented Jan 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

PawsFunctions commented Jan 13, 2026

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rtuszik commented Jan 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants