Skip to content

Conversation

@ChinmayBansal
Copy link
Contributor

Related Issues

Proposed Changes:

Added delete_by_filter() and update_by_filter() methods to OpenSearchDocumentStore to enable bulk operations on filtered documents.

Implementation details:

  • delete_by_filter(): Uses OpenSearch's delete_by_query API to efficiently delete documents matching filter criteria without retrieving them first
  • update_by_filter(): Uses OpenSearch's update_by_query API with Painless scripts to update metadata fields for matching documents
  • Returns count of affected documents for better observability
  • Implements both sync and async versions for consistency with existing document store methods

Use cases:

  • Bulk deletion based on metadata criteria (e.g., delete all documents older than a specific date)
  • Bulk metadata updates (e.g., mark all documents in a category as "reviewed" or "published")
  • Efficient document management without retrieving full documents first

Files changed:

How did you test it?

  • Integration tests: Added 4 new integration tests:
    • test_delete_by_filter: Verifies selective deletion based on metadata filters
    • test_update_by_filter: Verifies metadata updates for filtered documents
    • test_delete_by_filter_async: Async version of delete test
    • test_update_by_filter_async: Async version of update test

Notes for the reviewer

  • This is a new feature - first implementation of delete_by_filter() and update_by_filter()
  • Used OpenSearch's native delete_by_query and update_by_query APIs for efficiency and performance
  • The update_by_filter() method uses Painless scripting language to update metadata: ctx._source.metadata.{key} = params.{key}
  • Filter normalization uses the existing normalize_filters() function to ensure consistency with filter_documents() method
  • Both methods return affected document counts (from OpenSearch API response: result.get("deleted") and result.get("updated"))
  • Tests include time.sleep(2) after operations to ensure changes are reflected (OpenSearch eventual consistency)

Checklist

@ChinmayBansal ChinmayBansal requested a review from a team as a code owner October 20, 2025 06:24
@ChinmayBansal ChinmayBansal requested review from anakin87 and removed request for a team October 20, 2025 06:24
@github-actions github-actions bot added integration:opensearch type:documentation Improvements or additions to documentation labels Oct 20, 2025
@anakin87 anakin87 requested review from davidsbatista and removed request for anakin87 October 20, 2025 07:16
@ChinmayBansal ChinmayBansal changed the title add delete by filter and update by filer to OpenSearchDocumentStore feat: add delete by filter and update by filer to OpenSearchDocumentStore Oct 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integration:opensearch type:documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

add update_by_filter() and delete_by_filter() operations to OpenSearchDocumentStore

1 participant