Skip to content

Add deduplication docs to user guide#3210

Merged
ikreymer merged 6 commits intomainfrom
dedupe-docs
Mar 10, 2026
Merged

Add deduplication docs to user guide#3210
ikreymer merged 6 commits intomainfrom
dedupe-docs

Conversation

@tw4l
Copy link
Copy Markdown
Member

@tw4l tw4l commented Mar 5, 2026

Fixes #2933

Also available in Google Doc form internally if it's easier to review and make suggestions there.

@tw4l tw4l requested review from emma-sg and ikreymer March 5, 2026 16:19

Deduplication is facilitated by a _deduplication index_ on the collection, which contains information for every resource and URL in the collection’s archived items. Detailed technical information about how deduplication is implemented in Browsertrix is available in the [crawler’s developer documentation](https://crawler.docs.browsertrix.com/develop/dedupe/).

## Tradeoffs and Considerations
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this section should be at the end, after how to enable / manage?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tradeoffs and considerations section? I think it would be good to foreground it so people are prompted to better understand what they're enabling before they enable it.

@tw4l tw4l requested a review from ikreymer March 10, 2026 22:09
Copy link
Copy Markdown
Member

@ikreymer ikreymer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Reworked the dedupe page a bit and also made it top-level, as we'll likely expand it later. Thanks for covering all the different parts of it!

@ikreymer ikreymer merged commit 8b357d6 into main Mar 10, 2026
29 checks passed
@ikreymer ikreymer deleted the dedupe-docs branch March 10, 2026 23:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Task]: Update deduplication section in user guide

2 participants