This release introduces Search Resilience & Smart Fallback Strategies, making the agent much more proactive when initial search results are empty.
The search tools now include explicit "Retry Logic" for the LLM. If a narrow search (e.g., within a specific date range) yields no results, the agent is now instructed to:
- Relax Filters: Automatically try broader criteria, such as removing date boundaries or broadening the query.
- Contextual Retries: Inform the user if results are found in a different timeframe than requested.
Search results are no longer a silent "No results". The tools now return detailed context:
- Applied Filters: Lists all parameters (IDs, dates, query) that were used, helping the LLM reason about its next search attempt.
- Clear Guidance: Provides direct recommendations to the LLM on how to broaden the search.
To prevent the LLM from "hallucinating" or guessing correspondent/tag IDs:
- Mandatory Lookup: The
get_paperless_master_datatool now explicitly warns against guessing IDs and requires using it for lookup first.
- Raw Protocol Checks: This release has been verified using the raw MCP protocol checker.
- Test Suite: Verified by the full suite of 18 automated tests.
This release focuses on Stability and Large Document Support, ensuring that Searchless-ngx can handle even the most extensive document libraries without hitting API limits.
The Gemini Embedding API has a hard limit of 100 requests per batch. Searchless-ngx now automatically detects large documents and splits them into compliant batches. This prevents the INVALID_ARGUMENT errors previously encountered with documents exceeding 100 chunks.
To prevent resource exhaustion and ensure high-quality search results, we've introduced a configurable chunk limit:
MAX_CHUNKS_PER_DOC: New environment variable (Default:100).- Capacity: 100 chunks cover approximately 25 DIN-A4 pages of text.
- Graceful Handling: Documents exceeding this limit are truncated at the end, and a warning is logged.
- Automated Batching Tests: We've added a new mocked test suite to verify the batching logic without requiring internet access.
- Expanded Test Coverage: This release is verified by 18 automated tests, ensuring core stability.
We are proud to announce the Initial Release of Searchless-ngx (v0.1.3), the first production-ready version of our Agentic RAG MCP Server for Paperless-ngx.
Searchless-ngx transforms your Paperless-ngx instance from a static archive into an intelligent, conversational agent. By leveraging the Model Context Protocol (MCP) and Agentic RAG, it allows modern LLMs (like Gemini or GPT-4) to natively search, filter, and reason over your personal documents.
Search results are now presented as high-quality interactive cards designed specifically for Open WebUI:
- Linked Headers: Document titles link directly to the document detail view in Paperless.
- Dynamic Deep-Linking: Correspondent names and Tags are now clickable links that filter for related documents.
- Structural Preservation: OCR snippets now respect original line breaks and paragraph structure.
- Concise Layout: Strict 7-line snippet limits keep your chat clean and readable.
- Exact Metadata API: Leverage the full power of Paperless-ngx filtering (correspondents, tags, dates).
- Semantic Vector Search: Use ChromaDB and Gemini embeddings to find documents by meaning (e.g., "Find food receipts from Berlin").
- Custom Field Visibility: Custom fields are beautifully integrated and resolved into readable names.
- Paginated Cache: Efficiently handles large libraries with hundreds of tags and correspondents.
- Strict JSON Schema: 100% compatible with strict MCP parsers (no
anyOfornulltypes). - Comprehensive Verification: This release is verified by a full suite of automated tests covering all core components.
For detailed installation and Open WebUI setup instructions, please refer to the README.md and WEBUI_SETUP.md.
Thank you for choosing Searchless-ngx!