diff --git a/docs/mkdocs/docs/contribute_tools.md b/docs/mkdocs/docs/contribute_tools.md index c404118b..b51092e1 100644 --- a/docs/mkdocs/docs/contribute_tools.md +++ b/docs/mkdocs/docs/contribute_tools.md @@ -62,13 +62,7 @@ sub_agents: - tool-audio - new-tool-name # πŸ‘ˆ Add your new tool here ... -``` - - -## Examples -- `tool-reasoning` – reasoning utilities -- `tool-image-video` – visual understanding -- `new-tool-name` – your custom tool +``` --- diff --git a/docs/mkdocs/docs/tool_reasoning.md b/docs/mkdocs/docs/tool_reasoning.md index a73c3ad0..74f54da1 100644 --- a/docs/mkdocs/docs/tool_reasoning.md +++ b/docs/mkdocs/docs/tool_reasoning.md @@ -1,7 +1,41 @@ +# Reasoning Tools (`reasoning_mcp_server.py`) -# - Coming Soon - +The Reasoning MCP Server provides a **pure text-based reasoning engine**. It supports logical analysis, problem solving, and planning, using LLM backends (OpenAI or Anthropic) with retry and exponential backoff for robustness. +## Environment Variables +!!! warning "Where to Modify" + The `reasoning_mcp_server.py` reads environment variables that are passed through the `tool-reasoning.yaml` configuration file, not directly from `.env` file. +- OpenAI Configuration: + - `OPENAI_API_KEY` + - `OPENAI_BASE_URL` : default = `https://api.openai.com/v1` + - `OPENAI_MODEL_NAME` : default = `o3` + +- Anthropic Configuration: + - `ANTHROPIC_API_KEY` + - `ANTHROPIC_BASE_URL` : default = `https://api.anthropic.com` + - `ANTHROPIC_MODEL_NAME` : default = `claude-3-7-sonnet-20250219` --- + +## `reasoning(question: str)` +Perform step-by-step reasoning, analysis, and planning over a **text-only input**. This tool is specialized for **complex thinking tasks**. + +**Parameters** + +- `question`: A detailed, complex question or problem statement that includes all necessary information. The tool will not fetch external data or context. + +**Returns** + +- `str`: A structured, step-by-step reasoned answer. + +**Features** + +- Runs on OpenAI or Anthropic models, depending on available API keys. +- Exponential backoff retry logic (up to 5 attempts). +- For Anthropic, uses **Thinking mode** with token budget (21k max, 19k thinking). +- Ensures non-empty responses with fallback error reporting. + +--- + **Last Updated:** Sep 2025 **Doc Contributor:** Team @ MiroMind AI \ No newline at end of file diff --git a/docs/mkdocs/docs/tool_vqa.md b/docs/mkdocs/docs/tool_vqa.md index a73c3ad0..21e3e69c 100644 --- a/docs/mkdocs/docs/tool_vqa.md +++ b/docs/mkdocs/docs/tool_vqa.md @@ -1,7 +1,75 @@ +# Vision Tools (`vision_mcp_server.py`) -# - Coming Soon - +The Vision MCP Server enables OCR + Visual Question Answering (VQA) over images and multimodal understanding of YouTube videos, with pluggable backends (Anthropic, OpenAI, Google Gemini). + +--- + +## Environment Variables +!!! warning "Where to Modify" + The `vision_mcp_server.py` reads environment variables that are passed through the `tool-image-video.yaml` configuration file, not directly from `.env` file. +- Vision Backend Control: + - `ENABLE_CLAUDE_VISION`: `"true"` to allow Anthropic Vision backend. + - `ENABLE_OPENAI_VISION`: `"true"` to allow OpenAI Vision backend. +- Anthropic Configuration: + - `ANTHROPIC_API_KEY` + - `ANTHROPIC_BASE_URL` : default = `https://api.anthropic.com` + - `ANTHROPIC_MODEL_NAME` : default = `claude-3-7-sonnet-20250219` +- OpenAI Configuration: + - `OPENAI_API_KEY` + - `OPENAI_BASE_URL` : default = `https://api.openai.com/v1` + - `OPENAI_MODEL_NAME` : default = `gpt-4o` +- Gemini Configuration: + - `GEMINI_API_KEY` + - `GEMINI_MODEL_NAME` : default = `gemini-2.5-pro` --- + +## `visual_question_answering(image_path_or_url: str, question: str)` +Ask questions about an image. Runs **two passes**: + +1. **OCR pass** using the selected vision backend with a meticulous extraction prompt. + +2. **VQA pass** that analyzes the image and cross-checks against OCR text. + +**Parameters** + +- `image_path_or_url`: Local path (accessible to server) or web URL. HTTP URLs are auto-upgraded/validated to HTTPS for some backends. +- `question`: The user’s question about the image. + +**Returns** + +- `str`: Concatenated text with: + - `OCR results: ...` + - `VQA result: ...` + +**Features** + +- Automatic MIME detection, reads magic bytes, falls back to extension, final default is `image/jpeg`. + +--- + +## `visual_audio_youtube_analyzing(url: str, question: str = "", provide_transcribe: bool = False)` +Analyze **public YouTube videos** (audio + visual). Supports watch pages, Shorts, and Live VODs. + +- Accepted URL patterns: `youtube.com/watch`, `youtube.com/shorts`, `youtube.com/live`. + +**Parameters** + +- `url`: YouTube video URL (publicly accessible). +- `question` (optional): A specific question about the video. You can scope by time using `MM:SS` or `MM:SS-MM:SS` (e.g., `01:45`, `03:20-03:45`). +- `provide_transcribe` (optional, default `False`): If `True`, returns a **timestamped transcription** including salient events and brief visual descriptions. + +**Returns** + +- `str`: transcription of the video (if asked) and answer to the question. + +**Features** + +- **Gemini-powered** video analysis (requires `GEMINI_API_KEY`). +- Dual mode: full transcript, targeted Q&A, or both. + +--- + **Last Updated:** Sep 2025 -**Doc Contributor:** Team @ MiroMind AI \ No newline at end of file +**Doc Contributor:** Team @ MiroMind AI diff --git a/docs/mkdocs/mkdocs.yml b/docs/mkdocs/mkdocs.yml index 62124064..0959b28b 100644 --- a/docs/mkdocs/mkdocs.yml +++ b/docs/mkdocs/mkdocs.yml @@ -59,7 +59,7 @@ nav: - Overview: tool_overview.md - Tools: - tool-reasoning: tool_reasoning.md - - tool-vqa: tool_vqa.md + - tool-image-video: tool_vqa.md - tool-searching: tool_searching.md - tool-python: tool_python.md - Advanced Features: diff --git a/scripts/run_prepare_benchmark.sh b/scripts/run_prepare_benchmark.sh index 7574ed3e..eb005af0 100644 --- a/scripts/run_prepare_benchmark.sh +++ b/scripts/run_prepare_benchmark.sh @@ -21,4 +21,4 @@ uv run main.py prepare-benchmark get browsecomp-test uv run main.py prepare-benchmark get browsecomp-zh-test uv run main.py prepare-benchmark get hle uv run main.py prepare-benchmark get xbench-ds -uv run main.py prepare-benchmark get futurex \ No newline at end of file +uv run main.py prepare-benchmark get futurex