Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 1 addition & 7 deletions docs/mkdocs/docs/contribute_tools.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,13 +62,7 @@ sub_agents:
- tool-audio
- new-tool-name # 👈 Add your new tool here
...
```


## Examples
- `tool-reasoning` – reasoning utilities
- `tool-image-video` – visual understanding
- `new-tool-name` – your custom tool
```

---

Expand Down
36 changes: 35 additions & 1 deletion docs/mkdocs/docs/tool_reasoning.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,41 @@
# Reasoning Tools (`reasoning_mcp_server.py`)

# - Coming Soon -
The Reasoning MCP Server provides a **pure text-based reasoning engine**. It supports logical analysis, problem solving, and planning, using LLM backends (OpenAI or Anthropic) with retry and exponential backoff for robustness.

## Environment Variables
!!! warning "Where to Modify"
The `reasoning_mcp_server.py` reads environment variables that are passed through the `tool-reasoning.yaml` configuration file, not directly from `.env` file.
- OpenAI Configuration:
- `OPENAI_API_KEY`
- `OPENAI_BASE_URL` : default = `https://api.openai.com/v1`
- `OPENAI_MODEL_NAME` : default = `o3`

- Anthropic Configuration:
- `ANTHROPIC_API_KEY`
- `ANTHROPIC_BASE_URL` : default = `https://api.anthropic.com`
- `ANTHROPIC_MODEL_NAME` : default = `claude-3-7-sonnet-20250219`

---

## `reasoning(question: str)`
Perform step-by-step reasoning, analysis, and planning over a **text-only input**. This tool is specialized for **complex thinking tasks**.

**Parameters**

- `question`: A detailed, complex question or problem statement that includes all necessary information. The tool will not fetch external data or context.

**Returns**

- `str`: A structured, step-by-step reasoned answer.

**Features**

- Runs on OpenAI or Anthropic models, depending on available API keys.
- Exponential backoff retry logic (up to 5 attempts).
- For Anthropic, uses **Thinking mode** with token budget (21k max, 19k thinking).
- Ensures non-empty responses with fallback error reporting.

---

**Last Updated:** Sep 2025
**Doc Contributor:** Team @ MiroMind AI
72 changes: 70 additions & 2 deletions docs/mkdocs/docs/tool_vqa.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,75 @@
# Vision Tools (`vision_mcp_server.py`)

# - Coming Soon -
The Vision MCP Server enables OCR + Visual Question Answering (VQA) over images and multimodal understanding of YouTube videos, with pluggable backends (Anthropic, OpenAI, Google Gemini).

---

## Environment Variables
!!! warning "Where to Modify"
The `vision_mcp_server.py` reads environment variables that are passed through the `tool-image-video.yaml` configuration file, not directly from `.env` file.
- Vision Backend Control:
- `ENABLE_CLAUDE_VISION`: `"true"` to allow Anthropic Vision backend.
- `ENABLE_OPENAI_VISION`: `"true"` to allow OpenAI Vision backend.
- Anthropic Configuration:
- `ANTHROPIC_API_KEY`
- `ANTHROPIC_BASE_URL` : default = `https://api.anthropic.com`
- `ANTHROPIC_MODEL_NAME` : default = `claude-3-7-sonnet-20250219`
- OpenAI Configuration:
- `OPENAI_API_KEY`
- `OPENAI_BASE_URL` : default = `https://api.openai.com/v1`
- `OPENAI_MODEL_NAME` : default = `gpt-4o`
- Gemini Configuration:
- `GEMINI_API_KEY`
- `GEMINI_MODEL_NAME` : default = `gemini-2.5-pro`


---

## `visual_question_answering(image_path_or_url: str, question: str)`
Ask questions about an image. Runs **two passes**:

1. **OCR pass** using the selected vision backend with a meticulous extraction prompt.

2. **VQA pass** that analyzes the image and cross-checks against OCR text.

**Parameters**

- `image_path_or_url`: Local path (accessible to server) or web URL. HTTP URLs are auto-upgraded/validated to HTTPS for some backends.
- `question`: The user’s question about the image.

**Returns**

- `str`: Concatenated text with:
- `OCR results: ...`
- `VQA result: ...`

**Features**

- Automatic MIME detection, reads magic bytes, falls back to extension, final default is `image/jpeg`.

---

## `visual_audio_youtube_analyzing(url: str, question: str = "", provide_transcribe: bool = False)`
Analyze **public YouTube videos** (audio + visual). Supports watch pages, Shorts, and Live VODs.

- Accepted URL patterns: `youtube.com/watch`, `youtube.com/shorts`, `youtube.com/live`.

**Parameters**

- `url`: YouTube video URL (publicly accessible).
- `question` (optional): A specific question about the video. You can scope by time using `MM:SS` or `MM:SS-MM:SS` (e.g., `01:45`, `03:20-03:45`).
- `provide_transcribe` (optional, default `False`): If `True`, returns a **timestamped transcription** including salient events and brief visual descriptions.

**Returns**

- `str`: transcription of the video (if asked) and answer to the question.

**Features**

- **Gemini-powered** video analysis (requires `GEMINI_API_KEY`).
- Dual mode: full transcript, targeted Q&A, or both.

---

**Last Updated:** Sep 2025
**Doc Contributor:** Team @ MiroMind AI
**Doc Contributor:** Team @ MiroMind AI
2 changes: 1 addition & 1 deletion docs/mkdocs/mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ nav:
- Overview: tool_overview.md
- Tools:
- tool-reasoning: tool_reasoning.md
- tool-vqa: tool_vqa.md
- tool-image-video: tool_vqa.md
- tool-searching: tool_searching.md
- tool-python: tool_python.md
- Advanced Features:
Expand Down
2 changes: 1 addition & 1 deletion scripts/run_prepare_benchmark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -21,4 +21,4 @@ uv run main.py prepare-benchmark get browsecomp-test
uv run main.py prepare-benchmark get browsecomp-zh-test
uv run main.py prepare-benchmark get hle
uv run main.py prepare-benchmark get xbench-ds
uv run main.py prepare-benchmark get futurex
uv run main.py prepare-benchmark get futurex