Skip to content

fix: add type=table filter to prioritize table retrieval#828

Open
octo-patch wants to merge 1 commit intoCinnamon:mainfrom
octo-patch:fix/issue-752-prioritize-table-type-filter
Open

fix: add type=table filter to prioritize table retrieval#828
octo-patch wants to merge 1 commit intoCinnamon:mainfrom
octo-patch:fix/issue-752-prioritize-table-type-filter

Conversation

@octo-patch
Copy link
Copy Markdown

Fixes #752

Problem

When the "Prioritize Table" option is enabled, DocumentRetrievalPipeline retrieves extra documents from pages that contain relevant content. However, the metadata filter only matched by file_name and page_label, which caused it to fetch all document chunks on those pages — including non-table documents like regular text paragraphs.

As a result, the table documents that should be prioritized were not being reliably retrieved: the secondary retrieval was pulling in all document types, not just tables.

Solution

Added {"type": {"$eq": "table"}} to the metadata filter in the extra-table retrieval query. This ensures only documents with type="table" are fetched in the secondary retrieval pass, aligning with the intent of the "Prioritize Table" feature.

Testing

  • Verified the metadata filter structure matches how tables are stored (e.g., doc.metadata.get("type", "") == "table" used in reasoning/react.py, reasoning/rewoo.py, and qa/format_context.py)
  • The fix is a minimal, targeted change with no side effects on the main retrieval path

…mon#752)

The extra document retrieval in DocumentRetrievalPipeline was fetching
all documents on matching pages, not just table documents. Adding the
type='table' metadata filter ensures only table documents are returned
when the 'Prioritize Table' option is enabled.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Retrieve pipeline with Prioritize Table enabled

1 participant