📝 Update multimodal tool description

Phinease · web-flow · commit 05dab1ff0cce · 2025-11-29T15:49:22.000+08:00
diff --git a/doc/docs/en/sdk/core/tools.md b/doc/docs/en/sdk/core/tools.md
@@ -28,6 +28,10 @@ The current SDK includes the following tool types:
 - **GetEmailTool**: Email retrieval tool via IMAP
 - **SendEmailTool**: Email sending tool via SMTP
 
+### Multimodal Tools
+- **AnalyzeTextFileTool**: A document question-answering tool based on data processing and large language models
+- **AnalyzeImageTool**: An image question-answering tool based on visual language models
+
 ## 🔧 Common Characteristics
 
 ### 1. Basic Architecture
diff --git a/doc/docs/en/user-guide/agent-development.md b/doc/docs/en/user-guide/agent-development.md
@@ -45,7 +45,12 @@ Agents can use various tools to complete tasks, such as knowledge base search, e
   <img src="./assets/agent-development/set-tool.png" style="width: 50%; height: auto;" />
 </div>
 
-> 📚 Want to learn about all the built-in local tools in Nexent? Please see [Local Tools Overview](./local-tools/index.md).
+> 💡 **Tips**：
+> 1. Please select the `knowledge_base_search` tool to enable the knowledge base search function.
+> 2. Please select the `analyze_text_file` tool to enable the parsing function for document and text files.
+> 3. Please select the `analyze_image` tool to enable the parsing function for image files.
+> 
+> 📚 Want to learn about all the built-in local tools available in the system? Please refer to [Local Tools Overview](./local-tools/index.md).
 
 ### 🔌 Add MCP Tools
 
diff --git a/doc/docs/en/user-guide/local-tools/index.md b/doc/docs/en/user-guide/local-tools/index.md
@@ -30,6 +30,11 @@ Nexent preloads a set of reusable local tools grouped by capability: email, file
 - **tavily_search**: Uses the Tavily API to retrieve webpages, particularly strong for news and current events. Returns both text results and related image URLs, with optional image filtering. Request a free API key from [tavily.com](https://www.tavily.com/).
 - **linkup_search**: Uses the Linkup API to fetch text and images. In addition to regular webpages, it can return image-only results, making it useful when mixed media references are required. Register at [linkup.so](https://www.linkup.so/) to obtain a free API key.
 
+### 🖼️ Multimodal Tools
+
+- **analyze_text_file**: Based on user queries and the S3 URL, HTTP URL, and HTTPS URL of a text file, parse the file and use a large language model to understand it, answering user questions. An available large language model needs to be configured on the model management page.
+- **analyze_image**: Based on user queries and the S3 URL, HTTP URL, and HTTPS URL of an image, use a visual language model to analyze and understand the image, answering user questions. An available visual language model needs to be configured on the model management page.
+
 ### 🖥️ Terminal Tool
 
 The **Terminal Tool** is one of Nexent's core local capabilities that provides a persistent SSH session. Agents can execute remote commands, perform system inspections, read logs, or deploy services. Refer to the dedicated [Terminal Tool guide](./terminal-tool) for detailed setup, parameters, and security guidance.
diff --git a/doc/docs/en/user-guide/start-chat.md b/doc/docs/en/user-guide/start-chat.md
@@ -66,9 +66,15 @@ Nexent supports voice input (make sure you have configured the speech model unde
 
 ### Upload Files for Chat
 
-You can upload files during a chat, allowing agents to assist you based on file content:
+You can upload files during a chat so the agent can reason over their content:
 
-1. **Choose File Upload Method**
+> ⚠️ **Important:**
+> 1. Multimodal file conversations require the agent to have the corresponding parsing tools enabled during agent development. 
+>    2. For document or text files select the `analyze_text_file` tool.
+>    3. For image files select the `analyze_image` tool.
+> 2. Each uploaded file should ideally be under 10 MB. Split large documents into multiple uploads.
+
+1. **Choose a File Upload Method**
    - Click the file upload button in the lower right corner of the input box
    - Or drag files directly into the chat area
 
@@ -78,17 +84,15 @@ You can upload files during a chat, allowing agents to assist you based on file
    - **Images:** JPG, PNG, GIF, and other common formats
 
 3. **File Processing Flow**
-   - The system will automatically process your uploaded files
-   - Extract file content and add it to the current chat context
-   - The agent will answer your questions based on the file content
+   - The platform stores the uploaded file in MinIO and returns an S3 URL
+   - It builds structured file metadata and injects it into the active conversation
+   - The agent then answers your questions based on both the prompt and file metadata
 
 4. **File-based Chat**
-   - After uploading a file, you can ask questions about its content
-   - The agent can analyze, summarize, or process information from the file
+   - After uploading a file, ask questions about its contents at any time
+   - The agent can call the relevant multimodal tools to analyze, summarize, or process the data
    - Multiple files can be uploaded and processed simultaneously
 
-> ⚠️ **Note:** There is a file size limit for uploads. It is recommended that a single file not exceed 10MB. For large documents, upload in batches.
-
 ## 📚 Manage Your Chat History
 
 The left sidebar provides complete chat history management:
@@ -162,7 +166,7 @@ The right sidebar provides two tabs: "Source" and "Images" to help you understan
 
 ### Image Processing
 
-Nexent supports image input and processing (requires configuration of a vision model):
+Nexent supports image input and processing (make sure a vision model **and** the `analyze_image` tool are configured):
 
 1. **Upload Images**
    - Drag image files directly into the chat area
diff --git a/doc/docs/zh/sdk/core/tools.md b/doc/docs/zh/sdk/core/tools.md
@@ -28,6 +28,10 @@
 - **GetEmailTool**: 通过 IMAP 的邮件获取工具
 - **SendEmailTool**: 通过 SMTP 的邮件发送工具
 
+### 多模态工具
+- **AnalyzeTextFileTool**: 基于数据处理和大语言模型的文档问答工具
+- **AnalyzeImageTool**: 基于视觉语言模型的图片问答工具
+
 ## 🔧 工具共性特征
 
 ### 1. 基础架构
diff --git a/doc/docs/zh/user-guide/agent-development.md b/doc/docs/zh/user-guide/agent-development.md
@@ -32,7 +32,7 @@
 
 ### 🛠️ 选择 Agent 的工具
 
-智能体可以使用各种工具来完成任务，如知识库检索、收发邮件、文件管理等本地工具，也可接入第三方 MCP 工具，或自定义工具。
+智能体可以使用各种工具来完成任务，如知识库检索、文件解析、图片解析、收发邮件、文件管理等本地工具，也可接入第三方 MCP 工具，或自定义工具。
 
 1. 在"选择 Agent 的工具"页签右侧，点击"刷新工具"来刷新可用工具列表
 2. 选择想要添加工具所在的分组
@@ -45,6 +45,11 @@
   <img src="./assets/agent-development/set-tool.png" style="width: 50%; height: auto;" />
 </div>
 
+> 💡 **小贴士**：
+> 1. 请选择 `knowledge_base_search` 工具，启用知识库的检索功能。
+> 2. 请选择 `analyze_text_file` 工具，启用文档类、文本类文件的解析功能。
+> 3. 请选择 `analyze_image` 工具，启用图片类文件的解析功能。
+> 
 > 📚 想了解系统已经内置的所有本地工具能力？请参阅 [本地工具概览](./local-tools/index.md)。
 
 ### 🔌 添加 MCP 工具
diff --git a/doc/docs/zh/user-guide/local-tools/index.md b/doc/docs/zh/user-guide/local-tools/index.md
@@ -4,7 +4,7 @@ Nexent平台提供了丰富的本地工具，帮助智能体完成各种系统
 
 ## 🛠️ 可用工具
 
-Nexent预置了一组可以直接复用的本地工具。它们按照能力分为邮件、文件、搜索三大类，Terminal 工具则作为远程 Shell 能力单独提供。下方列出各工具的名称与核心特性，方便在 Agent 中快速定位所需能力。
+Nexent预置了一组可以直接复用的本地工具。它们按照能力分为邮件、文件、搜索、多模态三大类，Terminal 工具则作为远程 Shell 能力单独提供。下方列出各工具的名称与核心特性，方便在 Agent 中快速定位所需能力。
 
 ### 📧 邮件工具（Email）
 
@@ -30,6 +30,12 @@ Nexent预置了一组可以直接复用的本地工具。它们按照能力分
 - **tavily_search**：基于 Tavily API 的网页搜索，擅长新闻、实时资讯查询，同时返回文本结果和相关图片 URL，同样支持可选的图片过滤能力，可在 [tavily.com](https://www.tavily.com/) 免费申请 API Key。
 - **linkup_search**：使用 Linkup API 获取文本与图片结果，除了普通网页内容，还能返回纯图片结果，适合需要图文混合参考的场景。访问 [linkup.so](https://www.linkup.so/) 注册获取免费的 API Key。
 
+
+### 🖼️ 多模态工具（Multimodal）
+
+- **analyze_text_file**：基于用户提问和文本文件的s3 url、http url、https url，解析文件并使用大语言模型理解文件，回答用户问题。需要在模型管理页面配置可用的大语言模型。
+- **analyze_image**：基于用户提问和图片的s3 url、http url、https url，使用视觉语言模型分析理解图像，回答用户问题。需要在模型管理页面配置可用的视觉语言模型。
+
 ### 🖥️ Terminal工具
 
 **Terminal工具** 是 Nexent 平台的核心本地工具之一，提供持久化 SSH 会话能力，可在 Agent 中执行远程命令、进行系统巡检、读取日志或部署服务。详细的部署、参数和安全指引请查看专门的 [Terminal 使用手册](./terminal-tool.md)。
diff --git a/doc/docs/zh/user-guide/start-chat.md b/doc/docs/zh/user-guide/start-chat.md
@@ -64,10 +64,16 @@ Nexent支持语音输入功能，让您可以通过语音与智能体交互。
 
 > 💡 **小贴士**：为了获得更好的语音识别效果，请确保在安静的环境中使用，并清晰地发音。
 
-### 上传文件进行对话
+### 上传多模态文件进行对话
 
 您可以在对话中上传文件，让智能体基于文件内容为您提供帮助：
 
+> ⚠️ **注意事项**：
+> 1. 多模态文件对话功能，需在智能体开发时，选择对应的多模态解析工具 
+>    1. 文档类、文本类文件需选择 `analyze_text_file` 工具
+>    2. 工具、图片类文件需选择 `analyze_image` 工具
+> 2. 上传的文件大小有限制，建议单个文件不超过10MB。对于大型文档，建议分批上传
+
 1. **选择文件上传方式**
    - 点击输入框右下角的文件上传按钮
    - 或直接将文件拖拽到对话区域
@@ -78,17 +84,15 @@ Nexent支持语音输入功能，让您可以通过语音与智能体交互。
    - **图片类**：JPG、PNG、GIF 等常见图片格式
 
 3. **文件处理流程**
-   - 系统会自动处理您上传的文件
-   - 提取文件内容并添加到当前对话的上下文中
-   - 智能体会基于文件内容回答您的问题
+   - 系统会将您上传的文件存储至MinIO中，并返回S3 URL
+   - 构建文件元信息并添加到当前对话的上下文中
+   - 智能体会基于文件元信息回答您的问题
 
 4. **基于文件的对话**
    - 上传文件后，您可以询问关于文件内容的问题
-   - 智能体可以分析、总结或处理文件中的信息
+   - 智能体可以调用对应的多模态工具，分析、总结或处理文件中的信息
    - 支持多文件同时上传和处理
 
-> ⚠️ **注意事项**：上传的文件大小有限制，建议单个文件不超过10MB。对于大型文档，建议分批上传。
-
 ## 📚 管理您的对话历史
 
 左侧边栏提供了完整的对话历史管理功能：
@@ -152,6 +156,10 @@ Nexent支持语音输入功能，让您可以通过语音与智能体交互。
   - 点击“展开”可查看引用的详细内容
   - 点击网页标题可直接跳转到原始网页
 
+💡 **小贴士**：
+1. 智能体开发时，请选择 `knowledge_base_search` 工具，启用本地知识库检索功能。
+2. 智能体开发时，请选择 `exa_search`、 `tavily_search`、 `linkup_search` 工具，启用网络检索功能。
+
 ### 图片标签页
 
 - 展示从网络检索中获取的相关图片
@@ -167,15 +175,15 @@ Nexent支持语音输入功能，让您可以通过语音与智能体交互。
 
 ### 图像处理功能
 
-Nexent支持图像输入和处理（需要配置视觉模型）：
+Nexent支持图像输入和处理（需要配置视觉模型、图片解析工具`analyze_image`）：
 
 1. **上传图像**
    - 直接将图像文件拖拽到对话区域
    - 或点击上传按钮选择图像文件
    - 支持常见的图片格式（JPG、PNG、GIF等）
 
 2. **图像分析能力**
-   - 智能体会自动分析图像内容
+   - 智能体会调用图片解析工具，自动分析图像内容
    - 可以识别图像中的物体、文字、场景等元素
    - 基于图像内容回答您的问题