Skip to content

Summary Index: data does not be cleared when upload an exists document #36937

@Yanchongyun

Description

@Yanchongyun

Self Checks

  • I have read the Contributing Guide and Language Policy.
  • This is only for bug report, if you would like to ask a question, please head to Discussions.
  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report, otherwise it will be closed.
  • 【中文用户 & Non English User】请使用英语提交,否则会被关闭 :)
  • Please do not modify this template :) and fill in all the required fields.

Dify version

1.14.2

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

Step:

  1. Upload a file and turn on [Enable Summary]. Chunk summary is successful created.
  2. Upload same file to same dataset. (From now, Summary setting is disappered)
  3. Summary will not be regenrated at all. But summary data is still in database. (Vector data seems like same)
  4. Regerate summary from button at the top of the page. New summary generated but old data still in the database.

Postion:
When upload same file, knowledge_config.duplicate is True and only go into DuplicateDocumentIndexingTask.

if document_ids:
DocumentIndexingTaskProxy(dataset.tenant_id, dataset.id, document_ids).delay()
if duplicate_document_ids:
DuplicateDocumentIndexingTaskProxy(
dataset.tenant_id, dataset.id, duplicate_document_ids
).delay()

✔️ Expected Behavior

  • Generate new summary when upload same file.
  • Clear old summary pg data.
  • Clear old summary vector data.

❌ Actual Behavior

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions