Skip to content

Feature/mcp http acl#273

Draft
EnjoyBacon7 wants to merge 42 commits intodevfrom
feature/mcp-http-acl
Draft

Feature/mcp http acl#273
EnjoyBacon7 wants to merge 42 commits intodevfrom
feature/mcp-http-acl

Conversation

@EnjoyBacon7
Copy link
Collaborator

No description provided.

EnjoyBacon7 and others added 30 commits October 22, 2025 16:51
We now allow to directly request the LLM through the openAI endpoints.
This is useful when we want to interact with the LLM without relying on
documents, e.g. for translation, rephrasing, etc.
To use it, the client must simply not specify the `model`.

Note that is this approach, no system prompt will be used, it is
entirely up to the client.
docs: Documenting openrag's env variables
Removed the release job from the GitHub Actions workflow.
We improve the disk file saving by doing the following:
- Do not load the whole file in memory, but rather stream the http buffer and write in chunks
-  Do not make blocking I/O , to allow parallel writes
- Add random prefix to saved files, to avoid name collisions
Add a new `v1/tools` endpoint , to allow custom tools execution.
This is useful to execute specific openRAG features, such as text extraction,
without the semantic search functionnalities
Merge for release v1.1.5
Remove `max_tokens` default value to avoid cutting mid-generation.
docs: adding doc spoken style answer prompt
Expose read-only MCP search tools over streamable HTTP while reusing OpenRAG's high-level indexer search flow and enforcing existing token-based partition access.
Introduce components/app/ with an abstract interface (OpenRAGApiInterface)
and a concrete implementation (OpenRAGApplicationService) that wraps both
SearchToolService and IndexationService.  The service exposes every
operation needed by the API routers and the MCP server under a single,
testable surface:

- search (documents / partition / file)
- indexation catalog (list/get/delete files, chunks, partitions, users)
- file ingestion (add, replace, copy, update metadata, index URL)
- task management (status, error, logs, cancel)
- Ray actor management (list, restart)
- OpenAI-compatible endpoints (models, chat/completion)
- Tools (extractText)

RagPipeline is instantiated lazily and cached on the service instance to
avoid recreating it on every request.

Includes full unit-test coverage (29 tests) in test_service.py.
Replace the separate search_service + indexation_service construction in
mcp_server.py with a single OpenRAGApplicationService instance.

All tool handlers now call app_service directly.  Backward-compatible
module-level aliases (search_service, indexation_service) are kept so
existing tests that monkeypatch those names continue to work without
modification.
…vice

Replace direct Ray actor calls and ad-hoc business logic in every API
router with delegated calls to a module-level OpenRAGApplicationService
instance.  No endpoint behaviour changes.

Per-router highlights:
- actors.py   – list_ray_actors / restart_actor delegate to service;
                removes inline ray.kill / actor recreation code
- extract.py  – get_chunk_by_id delegates to service; maps KeyError →
                404 and PermissionError → 403
- indexer.py  – add/replace/delete/patch/copy file, task status/error/
                logs/cancel all delegate to service; removes unused json
                and ray imports
- openai.py   – list_models and chat/completion pipelines delegate to
                service; removes unused consts import
- partition.py – all partition CRUD and membership operations delegate
                to service
- queue.py    – get_queue_info and list_my_tasks delegate to service;
                removes unused Counter import and dead _format_pool_info
                helper
- search.py   – all three search endpoints delegate to service; removes
                unused get_indexer dependency from function signatures
- tools.py    – list_tools and execute_tool delegate to service
- users.py    – all user CRUD operations delegate to service
mcp_server.py now exposes a single app_service (OpenRAGApplicationService)
instead of separate search_service and indexation_service objects.

Update the patched_services fixture to build a composite MagicMock that
forwards every method to the appropriate fake service, then monkeypatches
mcp_mod.app_service with it.  The module-level search_service and
indexation_service aliases are also patched with the same composite to
keep any path that still references them working.
…dget

- Add LLM_CONTEXT_WINDOW config (default 8192) so the context budget is
  derived from the actual model limit rather than top_k × chunk_size
- Reserve 2048 tokens for system prompt + chat history overhead; the
  remaining tokens become max_context_tokens for retrieved documents
- Replace ChatOpenAI/tiktoken tokenizer in format_context() with a
  conservative char-based estimator (4 chars/token) that works correctly
  across all LLM backends including Mistral (tiktoken systematically
  undercounts Mistral tokens, causing context overflows)
…ualization

- Add VLM_CONTEXT_WINDOW config (default 8192) mirroring LLM_CONTEXT_WINDOW
- ChunkContextualizer now reads context_window from llm_config and truncates
  first_chunks / prev_chunks / current_chunk content before building the user
  message, guaranteeing the total input never exceeds the model's context limit
- Replace ChatOpenAI().get_num_tokens (tiktoken, underestimates non-GPT models)
  with a char-based estimator in BaseChunker._length_function; remove the now
  unused self.llm ChatOpenAI instance from BaseChunker.__init__
The get_file_chunks tool was returning all chunks of a file at once.
For large files this easily exceeds the MCP client LLM's context window
(observed: 36,877 tokens sent to an 8192-token model).

Add offset/limit pagination (default limit=10) so the model receives a
safe-sized page per call. The response now includes total_chunks,
offset, limit, and has_more so the model knows when to keep paging.

The REST route GET /{partition}/file/{file_id} only uses chunk IDs for
link generation (never returns content), so it passes limit=100_000 to
retrieve all IDs in a single call unaffected by pagination.
…etch

Replace the magic large number with -1 as the canonical sentinel for
'no limit' in get_file_chunks. The REST route that fetches all chunk IDs
for link generation now passes limit=-1 explicitly.
Image-captioned chunks can be ~1250 tokens each, making limit=10 exceed
an 8192-token context window before the conversation history is even
counted. Drop the default to 3 and document the constraint in the tool
description so models know to keep limit small.
…exist

When a user attempts to index a file into a non-existent partition via the
MCP index_url tool, the request was failing with 'Access denied for partition'
because _enforce_partition_access only checks membership, not existence.

Add _ensure_partition_exists to IndexationService, called before the access
check in index_url: if the partition is absent from both the DB and the
caller's membership list, it is created with the caller as owner, mirroring
the silent-allow behaviour already present in the REST API path
(routers/utils.py:ensure_partition_role).
… check

index_url was calling indexer.add_file.remote without the user argument,
causing set_details to store user_id=None. Any subsequent call to
get_indexation_task_status would then raise PermissionError because the
authenticated user's id did not match the None stored in the task details.
@coderabbitai
Copy link

coderabbitai bot commented Mar 11, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ac89f5a5-e62b-40fd-b2bc-9a877e120452

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/mcp-http-acl
📝 Coding Plan for PR comments
  • Generate coding plan

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@EnjoyBacon7
Copy link
Collaborator Author

This feature should implement #272

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants