Skip to content

Comments

Exclude document content field by default to optimize context window usage#45

Merged
baruchiro merged 9 commits intomainfrom
copilot/fix-unnecessary-document-content-transfer
Jan 20, 2026
Merged

Exclude document content field by default to optimize context window usage#45
baruchiro merged 9 commits intomainfrom
copilot/fix-unnecessary-document-content-transfer

Conversation

Copy link

Copilot AI commented Jan 18, 2026

Implementation Complete ✅

Simplified approach per feedback:

  • Remove code comments
  • Remove fields parameter complexity (no FieldFilterOptions, no field filtering logic)
  • Simply exclude content from all document responses by default
  • Add new get_document_content tool to retrieve content when needed
  • Update changeset to reflect simpler approach
  • Update tool descriptions to reference the new workflow
  • Refactor to use two map operations for better clarity

Changes Made

1. documentEnhancer.ts

  • Modified enhanceDocumentsArray() to use two separate map operations:
    • First map: Removes content field using destructuring
    • Second map: Adds enhanced fields (correspondent, document_type, tags, custom_fields with names)
  • Updated return type to Promise<Omit<EnhancedDocument, 'content'>[]>
  • Clean, simple implementation with no comments

2. documents.ts

  • Simplified get_document to just call convertDocsWithNames(doc, api)
  • Added new get_document_content tool that returns only id, title, and content
  • Updated tool descriptions for list_documents, get_document, and search_documents to mention content exclusion and reference get_document_content

3. Changeset

  • Updated to describe the new simpler approach

Usage

Default behavior (content excluded):

list_documents({ correspondent: 5678 })
get_document({ id: 123 })
// Both exclude content field

Get content when needed:

get_document_content({ id: 123 })
// Returns: { id, title, content }
Original prompt

This section details on the original issue you should resolve

<issue_title>Unnecessary Transfer of Document Content in MCP Queries</issue_title>
<issue_description>## Problem Summary

The current MCP functions list_documents and get_document always return the complete text content (content) of documents by default. This leads to significant performance and efficiency issues, as this information is often not needed but still:

  • slows down AI processing
  • unnecessarily fills the context window
  • inflates data transfer

Problem Description with Example

When merging similar correspondents, the workflow proceeds as follows:

  1. All correspondents are listed and the AI suggests merges (e.g., "Amazon" (ID: 1234) with "Amazon GmbH" (ID: 5678))
  2. An MCP query searches for all documents with correspondent ID 5678
  3. Problem occurs here: The result set contains all document information including content, which can comprise several thousand words
  4. The found documents are then assigned the new correspondent (ID: 1234)

In step 3, the text content is not needed, but it massively fills the context window and significantly slows down processing.

Proposed Solution

Extend the MCP functions with optional parameters that allow specifying which fields from the Paperless API response should be returned. For example:

  • Add a fields or exclude_fields parameter to control the returned data fields
  • By default, content could be excluded or only included upon explicit request

This would significantly improve performance and make efficient use of the context window.</issue_description>

<agent_instructions>note: it would be best if content is not included by default</agent_instructions>

Comments on the Issue (you are @copilot in this section)

@baruchiro OK, I see. But I want to hear what do you prefer as a default:
  • exclude content by default and include it only when listed in fields?
  • include content by default and exclude it only when listen in exclude_fields?
@baruchiro [OpenHands](https://github.com/OpenHands/OpenHands) started fixing the issue! You can monitor the progress [here](https://github.com/baruchiro/paperless-mcp/actions/runs/19274374044).

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.


Note

Optimizes document queries by removing heavy content payloads from default responses and providing an explicit retrieval path when needed.

  • Update documentEnhancer.ts to omit content from enhanced documents (type now Omit<EnhancedDocument, 'content'>[])
  • Add get_document_content tool returning { id, title, content }
  • Update tool descriptions for list_documents, get_document, and search_documents to note content exclusion
  • Changeset: minor release note documenting the behavioral change

Written by Cursor Bugbot for commit cde4cf3. This will update automatically on new commits. Configure here.

@changeset-bot
Copy link

changeset-bot bot commented Jan 18, 2026

🦋 Changeset detected

Latest commit: b1fa80b

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@baruchiro/paperless-mcp Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@coderabbitai
Copy link

coderabbitai bot commented Jan 18, 2026

Important

Review skipped

Bot user detected.

To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copilot AI and others added 5 commits January 18, 2026 13:41
Co-authored-by: baruchiro <17686879+baruchiro@users.noreply.github.com>
Co-authored-by: baruchiro <17686879+baruchiro@users.noreply.github.com>
Co-authored-by: baruchiro <17686879+baruchiro@users.noreply.github.com>
…istency

Co-authored-by: baruchiro <17686879+baruchiro@users.noreply.github.com>
Co-authored-by: baruchiro <17686879+baruchiro@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix unnecessary transfer of document content in MCP queries Exclude document content field by default to optimize context window usage Jan 18, 2026
Copilot AI requested a review from baruchiro January 18, 2026 13:47
Copy link
Owner

@baruchiro baruchiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Remove code comments.
  • let's take the author's suggestion: #27 (comment). Omit the content by default, and implement another tool to retrieve the content.
  • The future preparation with the options and fields properties is overkill.
  • I think excluding the content and supplying it via another tool is a feature change (but I'm not sure it is a breaking change).

Copilot AI and others added 2 commits January 18, 2026 16:26
…ent_content tool

Co-authored-by: baruchiro <17686879+baruchiro@users.noreply.github.com>
Co-authored-by: baruchiro <17686879+baruchiro@users.noreply.github.com>
Copy link
Author

Copilot AI commented Jan 18, 2026

I've simplified the implementation per your suggestions:

  1. ✅ Removed code comments
  2. ✅ Removed the fields parameter complexity - now just excludes content by default
  3. ✅ Added get_document_content tool to retrieve content when needed
  4. ✅ Updated tool descriptions to reference the new workflow

The implementation is now much simpler with commits cde4cf3, a9d4a69.

Comment on lines 105 to 106
return documents.map((doc) => {
const { content, ...docWithoutContent } = doc;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer use map twice:
documents.map().map(etc)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored to use two map operations as requested in commit b1fa80b.

@baruchiro baruchiro marked this pull request as ready for review January 20, 2026 04:32
Copilot AI review requested due to automatic review settings January 20, 2026 04:32
Co-authored-by: baruchiro <17686879+baruchiro@users.noreply.github.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes context window usage by excluding the content field from document queries by default, addressing performance issues when document text content isn't needed (e.g., during correspondent merge workflows).

Changes:

  • Modified documentEnhancer.ts to exclude the content field from enhanced document responses using destructuring
  • Added get_document_content tool to retrieve document text content when specifically needed
  • Updated tool descriptions for list_documents, get_document, and search_documents to document the content exclusion behavior

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
src/api/documentEnhancer.ts Modified enhanceDocumentsArray to destructure and exclude the content field from returned documents, updated return type to Omit<EnhancedDocument, 'content'>[]
src/tools/documents.ts Added new get_document_content tool to retrieve document text when needed; updated tool descriptions for list_documents, get_document, and search_documents to note content exclusion
.changeset/exclude-content-by-default.md Added changeset file marking this as a minor version change with appropriate description

@baruchiro baruchiro merged commit f9291df into main Jan 20, 2026
2 checks passed
@baruchiro baruchiro deleted the copilot/fix-unnecessary-document-content-transfer branch January 20, 2026 04:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unnecessary Transfer of Document Content in MCP Queries

2 participants