Draft
Conversation
We now allow to directly request the LLM through the openAI endpoints. This is useful when we want to interact with the LLM without relying on documents, e.g. for translation, rephrasing, etc. To use it, the client must simply not specify the `model`. Note that is this approach, no system prompt will be used, it is entirely up to the client.
docs: Documenting openrag's env variables
Removed the release job from the GitHub Actions workflow.
We improve the disk file saving by doing the following: - Do not load the whole file in memory, but rather stream the http buffer and write in chunks - Do not make blocking I/O , to allow parallel writes - Add random prefix to saved files, to avoid name collisions
Add a new `v1/tools` endpoint , to allow custom tools execution. This is useful to execute specific openRAG features, such as text extraction, without the semantic search functionnalities
Merge for release v1.1.5
Remove `max_tokens` default value to avoid cutting mid-generation.
new release 1.1.7
docs: adding doc spoken style answer prompt
Expose read-only MCP search tools over streamable HTTP while reusing OpenRAG's high-level indexer search flow and enforcing existing token-based partition access.
Introduce components/app/ with an abstract interface (OpenRAGApiInterface) and a concrete implementation (OpenRAGApplicationService) that wraps both SearchToolService and IndexationService. The service exposes every operation needed by the API routers and the MCP server under a single, testable surface: - search (documents / partition / file) - indexation catalog (list/get/delete files, chunks, partitions, users) - file ingestion (add, replace, copy, update metadata, index URL) - task management (status, error, logs, cancel) - Ray actor management (list, restart) - OpenAI-compatible endpoints (models, chat/completion) - Tools (extractText) RagPipeline is instantiated lazily and cached on the service instance to avoid recreating it on every request. Includes full unit-test coverage (29 tests) in test_service.py.
Replace the separate search_service + indexation_service construction in mcp_server.py with a single OpenRAGApplicationService instance. All tool handlers now call app_service directly. Backward-compatible module-level aliases (search_service, indexation_service) are kept so existing tests that monkeypatch those names continue to work without modification.
…vice
Replace direct Ray actor calls and ad-hoc business logic in every API
router with delegated calls to a module-level OpenRAGApplicationService
instance. No endpoint behaviour changes.
Per-router highlights:
- actors.py – list_ray_actors / restart_actor delegate to service;
removes inline ray.kill / actor recreation code
- extract.py – get_chunk_by_id delegates to service; maps KeyError →
404 and PermissionError → 403
- indexer.py – add/replace/delete/patch/copy file, task status/error/
logs/cancel all delegate to service; removes unused json
and ray imports
- openai.py – list_models and chat/completion pipelines delegate to
service; removes unused consts import
- partition.py – all partition CRUD and membership operations delegate
to service
- queue.py – get_queue_info and list_my_tasks delegate to service;
removes unused Counter import and dead _format_pool_info
helper
- search.py – all three search endpoints delegate to service; removes
unused get_indexer dependency from function signatures
- tools.py – list_tools and execute_tool delegate to service
- users.py – all user CRUD operations delegate to service
mcp_server.py now exposes a single app_service (OpenRAGApplicationService) instead of separate search_service and indexation_service objects. Update the patched_services fixture to build a composite MagicMock that forwards every method to the appropriate fake service, then monkeypatches mcp_mod.app_service with it. The module-level search_service and indexation_service aliases are also patched with the same composite to keep any path that still references them working.
…dget - Add LLM_CONTEXT_WINDOW config (default 8192) so the context budget is derived from the actual model limit rather than top_k × chunk_size - Reserve 2048 tokens for system prompt + chat history overhead; the remaining tokens become max_context_tokens for retrieved documents - Replace ChatOpenAI/tiktoken tokenizer in format_context() with a conservative char-based estimator (4 chars/token) that works correctly across all LLM backends including Mistral (tiktoken systematically undercounts Mistral tokens, causing context overflows)
…ualization - Add VLM_CONTEXT_WINDOW config (default 8192) mirroring LLM_CONTEXT_WINDOW - ChunkContextualizer now reads context_window from llm_config and truncates first_chunks / prev_chunks / current_chunk content before building the user message, guaranteeing the total input never exceeds the model's context limit - Replace ChatOpenAI().get_num_tokens (tiktoken, underestimates non-GPT models) with a char-based estimator in BaseChunker._length_function; remove the now unused self.llm ChatOpenAI instance from BaseChunker.__init__
The get_file_chunks tool was returning all chunks of a file at once.
For large files this easily exceeds the MCP client LLM's context window
(observed: 36,877 tokens sent to an 8192-token model).
Add offset/limit pagination (default limit=10) so the model receives a
safe-sized page per call. The response now includes total_chunks,
offset, limit, and has_more so the model knows when to keep paging.
The REST route GET /{partition}/file/{file_id} only uses chunk IDs for
link generation (never returns content), so it passes limit=100_000 to
retrieve all IDs in a single call unaffected by pagination.
…etch Replace the magic large number with -1 as the canonical sentinel for 'no limit' in get_file_chunks. The REST route that fetches all chunk IDs for link generation now passes limit=-1 explicitly.
Image-captioned chunks can be ~1250 tokens each, making limit=10 exceed an 8192-token context window before the conversation history is even counted. Drop the default to 3 and document the constraint in the tool description so models know to keep limit small.
…exist When a user attempts to index a file into a non-existent partition via the MCP index_url tool, the request was failing with 'Access denied for partition' because _enforce_partition_access only checks membership, not existence. Add _ensure_partition_exists to IndexationService, called before the access check in index_url: if the partition is absent from both the DB and the caller's membership list, it is created with the caller as owner, mirroring the silent-allow behaviour already present in the REST API path (routers/utils.py:ensure_partition_role).
… check index_url was calling indexer.add_file.remote without the user argument, causing set_details to store user_id=None. Any subsequent call to get_indexation_task_status would then raise PermissionError because the authenticated user's id did not match the None stored in the task details.
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan for PR comments
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Collaborator
Author
|
This feature should implement #272 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.