Skip to content

Commit 05dab1f

Browse files
authored
📝 Update multimodal tool description
2 parents d226323 + 8ca3d64 commit 05dab1f

File tree

8 files changed

+63
-22
lines changed

8 files changed

+63
-22
lines changed

doc/docs/en/sdk/core/tools.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,10 @@ The current SDK includes the following tool types:
2828
- **GetEmailTool**: Email retrieval tool via IMAP
2929
- **SendEmailTool**: Email sending tool via SMTP
3030

31+
### Multimodal Tools
32+
- **AnalyzeTextFileTool**: A document question-answering tool based on data processing and large language models
33+
- **AnalyzeImageTool**: An image question-answering tool based on visual language models
34+
3135
## 🔧 Common Characteristics
3236

3337
### 1. Basic Architecture

doc/docs/en/user-guide/agent-development.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,12 @@ Agents can use various tools to complete tasks, such as knowledge base search, e
4545
<img src="./assets/agent-development/set-tool.png" style="width: 50%; height: auto;" />
4646
</div>
4747

48-
> 📚 Want to learn about all the built-in local tools in Nexent? Please see [Local Tools Overview](./local-tools/index.md).
48+
> 💡 **Tips**
49+
> 1. Please select the `knowledge_base_search` tool to enable the knowledge base search function.
50+
> 2. Please select the `analyze_text_file` tool to enable the parsing function for document and text files.
51+
> 3. Please select the `analyze_image` tool to enable the parsing function for image files.
52+
>
53+
> 📚 Want to learn about all the built-in local tools available in the system? Please refer to [Local Tools Overview](./local-tools/index.md).
4954
5055
### 🔌 Add MCP Tools
5156

doc/docs/en/user-guide/local-tools/index.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,11 @@ Nexent preloads a set of reusable local tools grouped by capability: email, file
3030
- **tavily_search**: Uses the Tavily API to retrieve webpages, particularly strong for news and current events. Returns both text results and related image URLs, with optional image filtering. Request a free API key from [tavily.com](https://www.tavily.com/).
3131
- **linkup_search**: Uses the Linkup API to fetch text and images. In addition to regular webpages, it can return image-only results, making it useful when mixed media references are required. Register at [linkup.so](https://www.linkup.so/) to obtain a free API key.
3232

33+
### 🖼️ Multimodal Tools
34+
35+
- **analyze_text_file**: Based on user queries and the S3 URL, HTTP URL, and HTTPS URL of a text file, parse the file and use a large language model to understand it, answering user questions. An available large language model needs to be configured on the model management page.
36+
- **analyze_image**: Based on user queries and the S3 URL, HTTP URL, and HTTPS URL of an image, use a visual language model to analyze and understand the image, answering user questions. An available visual language model needs to be configured on the model management page.
37+
3338
### 🖥️ Terminal Tool
3439

3540
The **Terminal Tool** is one of Nexent's core local capabilities that provides a persistent SSH session. Agents can execute remote commands, perform system inspections, read logs, or deploy services. Refer to the dedicated [Terminal Tool guide](./terminal-tool) for detailed setup, parameters, and security guidance.

doc/docs/en/user-guide/start-chat.md

Lines changed: 14 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -66,9 +66,15 @@ Nexent supports voice input (make sure you have configured the speech model unde
6666
6767
### Upload Files for Chat
6868

69-
You can upload files during a chat, allowing agents to assist you based on file content:
69+
You can upload files during a chat so the agent can reason over their content:
7070

71-
1. **Choose File Upload Method**
71+
> ⚠️ **Important:**
72+
> 1. Multimodal file conversations require the agent to have the corresponding parsing tools enabled during agent development.
73+
> 2. For document or text files select the `analyze_text_file` tool.
74+
> 3. For image files select the `analyze_image` tool.
75+
> 2. Each uploaded file should ideally be under 10 MB. Split large documents into multiple uploads.
76+
77+
1. **Choose a File Upload Method**
7278
- Click the file upload button in the lower right corner of the input box
7379
- Or drag files directly into the chat area
7480

@@ -78,17 +84,15 @@ You can upload files during a chat, allowing agents to assist you based on file
7884
- **Images:** JPG, PNG, GIF, and other common formats
7985

8086
3. **File Processing Flow**
81-
- The system will automatically process your uploaded files
82-
- Extract file content and add it to the current chat context
83-
- The agent will answer your questions based on the file content
87+
- The platform stores the uploaded file in MinIO and returns an S3 URL
88+
- It builds structured file metadata and injects it into the active conversation
89+
- The agent then answers your questions based on both the prompt and file metadata
8490

8591
4. **File-based Chat**
86-
- After uploading a file, you can ask questions about its content
87-
- The agent can analyze, summarize, or process information from the file
92+
- After uploading a file, ask questions about its contents at any time
93+
- The agent can call the relevant multimodal tools to analyze, summarize, or process the data
8894
- Multiple files can be uploaded and processed simultaneously
8995

90-
> ⚠️ **Note:** There is a file size limit for uploads. It is recommended that a single file not exceed 10MB. For large documents, upload in batches.
91-
9296
## 📚 Manage Your Chat History
9397

9498
The left sidebar provides complete chat history management:
@@ -162,7 +166,7 @@ The right sidebar provides two tabs: "Source" and "Images" to help you understan
162166

163167
### Image Processing
164168

165-
Nexent supports image input and processing (requires configuration of a vision model):
169+
Nexent supports image input and processing (make sure a vision model **and** the `analyze_image` tool are configured):
166170

167171
1. **Upload Images**
168172
- Drag image files directly into the chat area

doc/docs/zh/sdk/core/tools.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,10 @@
2828
- **GetEmailTool**: 通过 IMAP 的邮件获取工具
2929
- **SendEmailTool**: 通过 SMTP 的邮件发送工具
3030

31+
### 多模态工具
32+
- **AnalyzeTextFileTool**: 基于数据处理和大语言模型的文档问答工具
33+
- **AnalyzeImageTool**: 基于视觉语言模型的图片问答工具
34+
3135
## 🔧 工具共性特征
3236

3337
### 1. 基础架构

doc/docs/zh/user-guide/agent-development.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@
3232

3333
### 🛠️ 选择 Agent 的工具
3434

35-
智能体可以使用各种工具来完成任务,如知识库检索、收发邮件、文件管理等本地工具,也可接入第三方 MCP 工具,或自定义工具。
35+
智能体可以使用各种工具来完成任务,如知识库检索、文件解析、图片解析、收发邮件、文件管理等本地工具,也可接入第三方 MCP 工具,或自定义工具。
3636

3737
1. 在"选择 Agent 的工具"页签右侧,点击"刷新工具"来刷新可用工具列表
3838
2. 选择想要添加工具所在的分组
@@ -45,6 +45,11 @@
4545
<img src="./assets/agent-development/set-tool.png" style="width: 50%; height: auto;" />
4646
</div>
4747

48+
> 💡 **小贴士**
49+
> 1. 请选择 `knowledge_base_search` 工具,启用知识库的检索功能。
50+
> 2. 请选择 `analyze_text_file` 工具,启用文档类、文本类文件的解析功能。
51+
> 3. 请选择 `analyze_image` 工具,启用图片类文件的解析功能。
52+
>
4853
> 📚 想了解系统已经内置的所有本地工具能力?请参阅 [本地工具概览](./local-tools/index.md)
4954
5055
### 🔌 添加 MCP 工具

doc/docs/zh/user-guide/local-tools/index.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ Nexent平台提供了丰富的本地工具,帮助智能体完成各种系统
44

55
## 🛠️ 可用工具
66

7-
Nexent预置了一组可以直接复用的本地工具。它们按照能力分为邮件、文件、搜索三大类,Terminal 工具则作为远程 Shell 能力单独提供。下方列出各工具的名称与核心特性,方便在 Agent 中快速定位所需能力。
7+
Nexent预置了一组可以直接复用的本地工具。它们按照能力分为邮件、文件、搜索、多模态三大类,Terminal 工具则作为远程 Shell 能力单独提供。下方列出各工具的名称与核心特性,方便在 Agent 中快速定位所需能力。
88

99
### 📧 邮件工具(Email)
1010

@@ -30,6 +30,12 @@ Nexent预置了一组可以直接复用的本地工具。它们按照能力分
3030
- **tavily_search**:基于 Tavily API 的网页搜索,擅长新闻、实时资讯查询,同时返回文本结果和相关图片 URL,同样支持可选的图片过滤能力,可在 [tavily.com](https://www.tavily.com/) 免费申请 API Key。
3131
- **linkup_search**:使用 Linkup API 获取文本与图片结果,除了普通网页内容,还能返回纯图片结果,适合需要图文混合参考的场景。访问 [linkup.so](https://www.linkup.so/) 注册获取免费的 API Key。
3232

33+
34+
### 🖼️ 多模态工具(Multimodal)
35+
36+
- **analyze_text_file**:基于用户提问和文本文件的s3 url、http url、https url,解析文件并使用大语言模型理解文件,回答用户问题。需要在模型管理页面配置可用的大语言模型。
37+
- **analyze_image**:基于用户提问和图片的s3 url、http url、https url,使用视觉语言模型分析理解图像,回答用户问题。需要在模型管理页面配置可用的视觉语言模型。
38+
3339
### 🖥️ Terminal工具
3440

3541
**Terminal工具** 是 Nexent 平台的核心本地工具之一,提供持久化 SSH 会话能力,可在 Agent 中执行远程命令、进行系统巡检、读取日志或部署服务。详细的部署、参数和安全指引请查看专门的 [Terminal 使用手册](./terminal-tool.md)

doc/docs/zh/user-guide/start-chat.md

Lines changed: 17 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -64,10 +64,16 @@ Nexent支持语音输入功能,让您可以通过语音与智能体交互。
6464

6565
> 💡 **小贴士**:为了获得更好的语音识别效果,请确保在安静的环境中使用,并清晰地发音。
6666
67-
### 上传文件进行对话
67+
### 上传多模态文件进行对话
6868

6969
您可以在对话中上传文件,让智能体基于文件内容为您提供帮助:
7070

71+
> ⚠️ **注意事项**
72+
> 1. 多模态文件对话功能,需在智能体开发时,选择对应的多模态解析工具
73+
> 1. 文档类、文本类文件需选择 `analyze_text_file` 工具
74+
> 2. 工具、图片类文件需选择 `analyze_image` 工具
75+
> 2. 上传的文件大小有限制,建议单个文件不超过10MB。对于大型文档,建议分批上传
76+
7177
1. **选择文件上传方式**
7278
- 点击输入框右下角的文件上传按钮
7379
- 或直接将文件拖拽到对话区域
@@ -78,17 +84,15 @@ Nexent支持语音输入功能,让您可以通过语音与智能体交互。
7884
- **图片类**:JPG、PNG、GIF 等常见图片格式
7985

8086
3. **文件处理流程**
81-
- 系统会自动处理您上传的文件
82-
- 提取文件内容并添加到当前对话的上下文中
83-
- 智能体会基于文件内容回答您的问题
87+
- 系统会将您上传的文件存储至MinIO中,并返回S3 URL
88+
- 构建文件元信息并添加到当前对话的上下文中
89+
- 智能体会基于文件元信息回答您的问题
8490

8591
4. **基于文件的对话**
8692
- 上传文件后,您可以询问关于文件内容的问题
87-
- 智能体可以分析、总结或处理文件中的信息
93+
- 智能体可以调用对应的多模态工具,分析、总结或处理文件中的信息
8894
- 支持多文件同时上传和处理
8995

90-
> ⚠️ **注意事项**:上传的文件大小有限制,建议单个文件不超过10MB。对于大型文档,建议分批上传。
91-
9296
## 📚 管理您的对话历史
9397

9498
左侧边栏提供了完整的对话历史管理功能:
@@ -152,6 +156,10 @@ Nexent支持语音输入功能,让您可以通过语音与智能体交互。
152156
- 点击“展开”可查看引用的详细内容
153157
- 点击网页标题可直接跳转到原始网页
154158

159+
💡 **小贴士**
160+
1. 智能体开发时,请选择 `knowledge_base_search` 工具,启用本地知识库检索功能。
161+
2. 智能体开发时,请选择 `exa_search``tavily_search``linkup_search` 工具,启用网络检索功能。
162+
155163
### 图片标签页
156164

157165
- 展示从网络检索中获取的相关图片
@@ -167,15 +175,15 @@ Nexent支持语音输入功能,让您可以通过语音与智能体交互。
167175

168176
### 图像处理功能
169177

170-
Nexent支持图像输入和处理(需要配置视觉模型):
178+
Nexent支持图像输入和处理(需要配置视觉模型、图片解析工具`analyze_image`):
171179

172180
1. **上传图像**
173181
- 直接将图像文件拖拽到对话区域
174182
- 或点击上传按钮选择图像文件
175183
- 支持常见的图片格式(JPG、PNG、GIF等)
176184

177185
2. **图像分析能力**
178-
- 智能体会自动分析图像内容
186+
- 智能体会调用图片解析工具,自动分析图像内容
179187
- 可以识别图像中的物体、文字、场景等元素
180188
- 基于图像内容回答您的问题
181189

0 commit comments

Comments
 (0)