Skip to content

fix: clean and regenerate summary index on duplicate document upload (#36937)#36964

Open
shifang0511 wants to merge 2 commits into
langgenius:mainfrom
shifang0511:fix/duplicate-document-summary-cleanup-36937
Open

fix: clean and regenerate summary index on duplicate document upload (#36937)#36964
shifang0511 wants to merge 2 commits into
langgenius:mainfrom
shifang0511:fix/duplicate-document-summary-cleanup-36937

Conversation

@shifang0511
Copy link
Copy Markdown
Contributor

@shifang0511 shifang0511 commented Jun 2, 2026

Summary

When a duplicate document is uploaded, _duplicate_document_indexing_task clears old segments and vectors but previously skipped cleaning summary index entries and never re-queued summary generation. This left stale summary data and prevented new summaries from being created after a duplicate upload.

Changes:

  • Delete old summary DB records and vectors alongside segment cleanup via SummaryIndexService.delete_summaries_for_segments
  • Re-queue summary index generation after duplicate indexing completes, matching the behavior of the normal document indexing task

Closes #36937

Test plan

  • New unit test verifies summary deletion and re-queue on duplicate upload
  • All 12 existing tests pass
  • Ruff linting passes

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

@dosubot dosubot Bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Jun 2, 2026
…anggenius#36937)

When a duplicate document is uploaded, _duplicate_document_indexing_task
clears old segments and vectors but previously skipped cleaning summary
index entries and never re-queued summary generation. This left stale
summary data and prevented new summaries from being created.

- Delete old summary DB records and vectors alongside segment cleanup
- Re-queue summary index generation after duplicate indexing completes,
  matching the behavior of normal document indexing task

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@shifang0511 shifang0511 force-pushed the fix/duplicate-document-summary-cleanup-36937 branch from 80236f5 to 2dfdf42 Compare June 3, 2026 04:32
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 3, 2026

Pyrefly Type Coverage

Metric Base PR Delta
Type coverage 46.02% 46.02% -0.00%
Strict coverage 45.52% 45.52% -0.00%
Typed symbols 24,989 24,991 +2
Untyped symbols 29,635 29,639 +4
Modules 2790 2790 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Summary Index: data does not be cleared when upload an exists document

1 participant