Skip to content

Commit 24c19ad

Browse files
committed
✨ Unit test: optimize the display effect of the final answer
1 parent f3774df commit 24c19ad

File tree

8 files changed

+42
-82
lines changed

8 files changed

+42
-82
lines changed

backend/prompts/managed_system_prompt_template.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -61,8 +61,8 @@ system_prompt: |-
6161
- 用简单的Python编写代码
6262
- 遵循python代码规范和python语法
6363
- 根据格式规范正确调用工具
64-
- 考虑到代码执行与展示用户代码的区别,使用'代码:\n```<RUN>\n'开头,并以'```<END_CODE>'表达运行代码,使用'代码:\n```<DISPLAY:语言类型>\n'开头,并以'```<END_CODE>'表达展示代码
65-
- 注意运行的代码不会被用户看到,所以如果用户需要看到代码,你需要使用'代码:\n```<DISPLAY:语言类型>\n'开头,并以'```<END_CODE>'表达展示代码。
64+
- 考虑到代码执行与展示用户代码的区别,使用'代码:\n```<RUN>\n'开头,并以'```<END_CODE>'表达运行代码,使用'代码:\n```<DISPLAY:语言类型>\n'开头,并以'```<END_DISPLAY_CODE>'表达展示代码
65+
- 注意运行的代码不会被用户看到,所以如果用户需要看到代码,你需要使用'代码:\n```<DISPLAY:语言类型>\n'开头,并以'```<END_DISPLAY_CODE>'表达展示代码。
6666
6767
3. 观察结果:
6868
- 查看代码执行结果
@@ -113,7 +113,7 @@ system_prompt: |-
113113
{{ constraint }}
114114
115115
### python代码规范
116-
1. 如果认为是需要执行的代码,代码内容以'代码:\n```<RUN>\n'开头,并以'```<END_CODE>'标识符结尾。如果是不需要执行仅用于展示的代码,代码内容以'代码:\n```<DISPLAY:语言类型>\n'开头,并以'```<END_CODE>'标识符结尾,其中语言类型例如python、java、javascript等;
116+
1. 如果认为是需要执行的代码,代码内容以'代码:\n```<RUN>\n'开头,并以'```<END_CODE>'标识符结尾。如果是不需要执行仅用于展示的代码,代码内容以'代码:\n```<DISPLAY:语言类型>\n'开头,并以'```<END_DISPLAY_CODE>'标识符结尾,其中语言类型例如python、java、javascript等;
117117
2. 只使用已定义的变量,变量将在多次调用之间持续保持;
118118
3. 使用“print()”函数让下一次的模型调用看到对应变量信息;
119119
4. 正确使用工具的入参,使用关键字参数,不要用字典形式;

backend/prompts/managed_system_prompt_template_en.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -61,8 +61,8 @@ system_prompt: |-
6161
- Write code in simple Python
6262
- Follow Python coding standards and Python syntax
6363
- Call tools correctly according to format specifications
64-
- To distinguish between code execution and displaying user code, use 'Code: \n```<RUN>\n' to start executing code and '```<END_CODE>' to indicate its completion. Use 'Code: \n```<DISPLAY:language_type>\n' to start displaying code and '```<END_CODE>' to indicate its completion.
65-
- Note that executed code is not visible to users. If users need to see the code, use 'Code: \n```<DISPLAY:language_type>\n' as the start and '```<END_CODE>' to denote displayed code.
64+
- To distinguish between code execution and displaying user code, use 'Code: \n```<RUN>\n' to start executing code and '```<END_CODE>' to indicate its completion. Use 'Code: \n```<DISPLAY:language_type>\n' to start displaying code and '```<END_DISPLAY_CODE>' to indicate its completion.
65+
- Note that executed code is not visible to users. If users need to see the code, use 'Code: \n```<DISPLAY:language_type>\n' as the start and '```<END_DISPLAY_CODE>' to denote displayed code.
6666
6767
3. Observe Results:
6868
- View code execution results
@@ -113,7 +113,7 @@ system_prompt: |-
113113
{{ constraint }}
114114
115115
### Python Code Specifications
116-
1. If it is considered to be code that needs to be executed, the code content begins with 'code: \n```<RUN>\n' and ends with '```<END_CODE>'. If the code does not need to be executed for display only, the code content begins with 'code:\n```<DISPLAY:language_type>\n', and ends with '```<END_CODE>', where language_type can be python, java, javascript, etc;
116+
1. If it is considered to be code that needs to be executed, the code content begins with 'code: \n```<RUN>\n' and ends with '```<END_CODE>'. If the code does not need to be executed for display only, the code content begins with 'code:\n```<DISPLAY:language_type>\n', and ends with '```<END_DISPLAY_CODE>', where language_type can be python, java, javascript, etc;
117117
2. Only use defined variables, variables will persist between multiple calls;
118118
3. Use "print()" function to let the next model call see corresponding variable information;
119119
4. Use tool input parameters correctly, use keyword arguments, not dictionary format;

backend/prompts/manager_system_prompt_template.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -62,8 +62,8 @@ system_prompt: |-
6262
- 用简单的Python编写代码
6363
- 遵循python代码规范和python语法
6464
- 正确调用工具或助手解决问题
65-
- 考虑到代码执行与展示用户代码的区别,使用'代码:\n```<RUN>\n'开头,并以'```<END_CODE>'表达运行代码,使用'代码:\n```<DISPLAY:语言类型>\n'开头,并以'```<END_CODE>'表达展示代码
66-
- 注意运行的代码不会被用户看到,所以如果用户需要看到代码,你需要使用'代码:\n```<DISPLAY:语言类型>\n'开头,并以'```<END_CODE>'表达展示代码。
65+
- 考虑到代码执行与展示用户代码的区别,使用'代码:\n```<RUN>\n'开头,并以'```<END_CODE>'表达运行代码,使用'代码:\n```<DISPLAY:语言类型>\n'开头,并以'```<END_DISPLAY_CODE>'表达展示代码
66+
- 注意运行的代码不会被用户看到,所以如果用户需要看到代码,你需要使用'代码:\n```<DISPLAY:语言类型>\n'开头,并以'```<END_DISPLAY_CODE>'表达展示代码。
6767
6868
3. 观察结果:
6969
- 查看代码执行结果
@@ -141,7 +141,7 @@ system_prompt: |-
141141
{{ constraint }}
142142
143143
### python代码规范
144-
1. 如果认为是需要执行的代码,代码内容以'代码:\n```<RUN>\n'开头,并以'```<END_CODE>'标识符结尾。如果是不需要执行仅用于展示的代码,代码内容以'代码:\n```<DISPLAY:语言类型>\n'开头,并以'```<END_CODE>'标识符结尾,其中语言类型例如python、java、javascript等;
144+
1. 如果认为是需要执行的代码,代码内容以'代码:\n```<RUN>\n'开头,并以'```<END_CODE>'标识符结尾。如果是不需要执行仅用于展示的代码,代码内容以'代码:\n```<DISPLAY:语言类型>\n'开头,并以'```<END_DISPLAY_CODE>'标识符结尾,其中语言类型例如python、java、javascript等;
145145
2. 只使用已定义的变量,变量将在多次调用之间持续保持;
146146
3. 使用“print()”函数让下一次的模型调用看到对应变量信息;
147147
4. 正确使用工具/助手的入参,使用关键字参数,不要用字典形式;

backend/prompts/manager_system_prompt_template_en.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -62,8 +62,8 @@ system_prompt: |-
6262
- Write code in simple Python
6363
- Follow Python coding standards and Python syntax
6464
- Correctly call tools or agents to solve problems
65-
- To distinguish between code execution and displaying user code, use 'Code: \n```<RUN>\n' to start executing code and '```<END_CODE>' to indicate its completion. Use 'Code: \n```<DISPLAY:language_type>\n' to start displaying code and '```<END_CODE>' to indicate its completion.
66-
- Note that executed code is not visible to users. If users need to see the code, use 'Code: \n```<DISPLAY:language_type>\n' as the start and '```<END_CODE>' to denote displayed code.
65+
- To distinguish between code execution and displaying user code, use 'Code: \n```<RUN>\n' to start executing code and '```<END_CODE>' to indicate its completion. Use 'Code: \n```<DISPLAY:language_type>\n' to start displaying code and '```<END_DISPLAY_CODE>' to indicate its completion.
66+
- Note that executed code is not visible to users. If users need to see the code, use 'Code: \n```<DISPLAY:language_type>\n' as the start and '```<END_DISPLAY_CODE>' to denote displayed code.
6767
6868
3. Observe Results:
6969
- View code execution results
@@ -141,7 +141,7 @@ system_prompt: |-
141141
{{ constraint }}
142142
143143
### Python Code Specifications
144-
1. If it is considered to be code that needs to be executed, the code content begins with 'code: \n```<RUN>\n' and ends with '```<END_CODE>'. If the code does not need to be executed for display only, the code content begins with 'code: \n```<DISPLAY:language_type>\n', and ends with '```<END_CODE>', where language_type can be python, java, javascript, etc;
144+
1. If it is considered to be code that needs to be executed, the code content begins with 'code: \n```<RUN>\n' and ends with '```<END_CODE>'. If the code does not need to be executed for display only, the code content begins with 'code: \n```<DISPLAY:language_type>\n', and ends with '```<END_DISPLAY_CODE>', where language_type can be python, java, javascript, etc;
145145
2. Only use defined variables, variables will persist between multiple calls;
146146
3. Use "print()" function to let the next model call see corresponding variable information;
147147
4. Use tool/agent input parameters correctly, use keyword arguments, not dictionary format;

backend/prompts/utils/prompt_generate.yaml

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -52,16 +52,16 @@ FEW_SHOTS_SYSTEM_PROMPT: |-
5252
- 用简单的Python编写代码
5353
- 遵循python代码规范和python语法
5454
- 根据格式规范正确调用工具/助手
55-
- 考虑到代码执行与展示用户代码的区别,使用'代码:\n```<RUN>\n'开头,并以'```<END_CODE>'表达运行代码,使用'代码:\n```<DISPLAY:语言类型>\n'开头,并以'```<END_CODE>'表达展示代码
56-
- 注意运行的代码不会被用户看到,所以如果用户需要看到代码,你需要使用'代码:\n```<DISPLAY:语言类型>\n'开头,并以'```<END_CODE>'表达展示代码。
55+
- 考虑到代码执行与展示用户代码的区别,使用'代码:\n```<RUN>\n'开头,并以'```<END_CODE>'表达运行代码,使用'代码:\n```<DISPLAY:语言类型>\n'开头,并以'```<END_DISPLAY_CODE>'表达展示代码
56+
- 注意运行的代码不会被用户看到,所以如果用户需要看到代码,你需要使用'代码:\n```<DISPLAY:语言类型>\n'开头,并以'```<END_DISPLAY_CODE>'表达展示代码。
5757
5858
3. 观察结果:
5959
- 查看代码执行结果
6060
6161
在思考结束后,当Agent认为可以回答用户问题,那么可以不生成代码,直接生成最终回答给到用户并停止循环。
6262
6363
### python代码规范
64-
1. 如果认为是需要执行的代码,代码内容以'代码:\n```<RUN>\n'开头,并以'```<END_CODE>'标识符结尾。如果是不需要执行仅用于展示的代码,代码内容以'代码:\n```<DISPLAY:语言类型>\n'开头,并以'```<END_CODE>'标识符结尾,其中语言类型例如python、java、javascript等;
64+
1. 如果认为是需要执行的代码,代码内容以'代码:\n```<RUN>\n'开头,并以'```<END_CODE>'标识符结尾。如果是不需要执行仅用于展示的代码,代码内容以'代码:\n```<DISPLAY:语言类型>\n'开头,并以'```<END_DISPLAY_CODE>'标识符结尾,其中语言类型例如python、java、javascript等;
6565
2. 只使用已定义的变量,变量将在多次调用之间持续保持;
6666
3. 使用“print()”函数让下一次的模型调用看到对应变量信息;
6767
4. 正确使用工具/助手的入参,使用关键字参数,不要用字典形式;
@@ -160,11 +160,12 @@ FEW_SHOTS_SYSTEM_PROMPT: |-
160160
middle = [x for x in arr if x == pivot]
161161
right = [x for x in arr if x > pivot]
162162
return quick_sort(left) + middle + quick_sort(right)
163-
```<END_CODE>
163+
```<END_DISPLAY_CODE>
164164
观察结果:快速排序的python代码。
165165
166166
思考:我已经获得了快速排序的python代码,现在我将生成最终回答。
167167
快速排序的python代码如下:
168+
代码:
168169
```<DISPLAY:python>
169170
def quick_sort(arr):
170171
if len(arr) <= 1:
@@ -174,7 +175,7 @@ FEW_SHOTS_SYSTEM_PROMPT: |-
174175
middle = [x for x in arr if x == pivot]
175176
right = [x for x in arr if x > pivot]
176177
return quick_sort(left) + middle + quick_sort(right)
177-
```<END_CODE>
178+
```<END_DISPLAY_CODE>
178179
179180
---
180181

backend/prompts/utils/prompt_generate_en.yaml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -53,16 +53,16 @@ FEW_SHOTS_SYSTEM_PROMPT: |-
5353
- Write code in simple Python
5454
- Follow Python coding standards and Python syntax
5555
- Call tools/assistants correctly according to format specifications
56-
- To distinguish between code execution and displaying user code, use 'Code: \n```<RUN>\n' to start executing code and '```<END_CODE>' to indicate its completion. Use 'Code: \n```<DISPLAY:language_type>\n' to start displaying code and '```<END_CODE>' to indicate its completion.
57-
- Note that executed code is not visible to users. If users need to see the code, use 'Code: \n```<DISPLAY:language_type>\n' as the start and '```<END_CODE>' to denote displayed code.
56+
- To distinguish between code execution and displaying user code, use 'Code: \n```<RUN>\n' to start executing code and '```<END_CODE>' to indicate its completion. Use 'Code: \n```<DISPLAY:language_type>\n' to start displaying code and '```<END_DISPLAY_CODE>' to indicate its completion.
57+
- Note that executed code is not visible to users. If users need to see the code, use 'Code: \n```<DISPLAY:language_type>\n' as the start and '```<END_DISPLAY_CODE>' to denote displayed code.
5858
5959
3. Observe Results:
6060
- View code execution results
6161
6262
After thinking, when you believe you can answer the user's question, you can generate a final answer directly to the user without generating code and stop the loop.
6363
6464
### Python Code Specifications
65-
1. If it is considered to be code that needs to be executed, the code content begins with 'Code:\n```<RUN>\n' and ends with '```<END_CODE>'. If the code does not need to be executed for display only, the code content begins with 'Code:\n```<DISPLAY:language_type>\n', and ends with '```<END_CODE>', where language_type can be python, java, javascript, etc.;
65+
1. If it is considered to be code that needs to be executed, the code content begins with 'Code:\n```<RUN>\n' and ends with '```<END_CODE>'. If the code does not need to be executed for display only, the code content begins with 'Code:\n```<DISPLAY:language_type>\n', and ends with '```<END_DISPLAY_CODE>', where language_type can be python, java, javascript, etc.;
6666
2. Only use defined variables, variables will persist between multiple calls;
6767
3. Use "print()" function to let the next model call see corresponding variable information;
6868
4. Use tool/assistant input parameters correctly, use keyword arguments, not dictionary format;
@@ -158,7 +158,7 @@ FEW_SHOTS_SYSTEM_PROMPT: |-
158158
middle = [x for x in arr if x == pivot]
159159
right = [x for x in arr if x > pivot]
160160
return quick_sort(left) + middle + quick_sort(right)
161-
```<END_CODE>
161+
```<END_DISPLAY_CODE>
162162
Observe Results: The Python quick sort code.
163163
164164
Think: I have obtained the Python quick sort code, now I will generate the final answer.
@@ -172,7 +172,7 @@ FEW_SHOTS_SYSTEM_PROMPT: |-
172172
middle = [x for x in arr if x == pivot]
173173
right = [x for x in arr if x > pivot]
174174
return quick_sort(left) + middle + quick_sort(right)
175-
```<END_CODE>
175+
```<END_DISPLAY_CODE>
176176
177177
---
178178

sdk/nexent/core/agents/core_agent.py

Lines changed: 1 addition & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,6 @@
22
import ast
33
import time
44
import threading
5-
import logging
65
from textwrap import dedent
76
from typing import Any, Optional, List, Dict
87
from collections.abc import Generator
@@ -24,7 +23,6 @@
2423
if TYPE_CHECKING:
2524
import PIL.Image
2625

27-
logger = logging.getLogger(__name__)
2826

2927
def parse_code_blobs(text: str) -> str:
3028
"""Extract code blocs from the LLM's output for execution.
@@ -40,7 +38,6 @@ def parse_code_blobs(text: str) -> str:
4038
4139
Raises:
4240
ValueError: If no valid code block is found in the text.
43-
DisplayCodeOnlyError: If only DISPLAY code blocks are found (no executable code).
4441
"""
4542
# First try to match the new <RUN> format for execution
4643
# <END_CODE> is optional - match both with and without it
@@ -63,28 +60,6 @@ def parse_code_blobs(text: str) -> str:
6360
except SyntaxError:
6461
pass
6562

66-
# Check if there are DISPLAY code blocks (but no executable code)
67-
# Only raise DisplayCodeOnlyError if:
68-
# 1. There are DISPLAY code blocks present
69-
# 2. AND there are NO executable code blocks (RUN or py/python) anywhere in the text
70-
# This ensures we don't skip execution if executable code appears later in the output
71-
display_pattern = r"```<DISPLAY:\w+>\s*\n(.*?)\n```(?:<END_CODE>)?"
72-
display_matches = re.findall(display_pattern, text, re.DOTALL)
73-
has_display = bool(display_matches)
74-
75-
# Double-check: ensure there are NO executable code blocks anywhere
76-
# (This is a safety check - we should have already checked above, but be explicit)
77-
run_check = bool(re.search(r"```<RUN>\s*\n(.*?)\n```(?:<END_CODE>)?", text, re.DOTALL))
78-
py_check = bool(re.search(r"```(?:py|python)\s*\n(.*?)\n```", text, re.DOTALL))
79-
has_executable = run_check or py_check
80-
81-
logger.info(f"[parse_code_blobs] Code block detection: has_display={has_display} (found {len(display_matches)} DISPLAY blocks), "
82-
f"has_executable={has_executable} (RUN={run_check}, py/python={py_check})")
83-
84-
if has_display and not has_executable:
85-
raise DisplayCodeOnlyError()
86-
87-
logger.info("[parse_code_blobs] No valid executable code block found, raising ValueError")
8863
raise ValueError(
8964
dedent(
9065
f"""
@@ -117,6 +92,7 @@ def convert_code_format(text):
11792

11893
# Restore <END_CODE> if it was affected by the above replacement
11994
text = text.replace("```<END_CODE>", "```")
95+
text = text.replace("```<END_DISPLAY_CODE>", "```")
12096

12197
# Clean up any remaining ```< patterns
12298
text = text.replace("```<", "```")
@@ -129,11 +105,6 @@ class FinalAnswerError(Exception):
129105
pass
130106

131107

132-
class DisplayCodeOnlyError(Exception):
133-
"""Raised when only DISPLAY code blocks are found (no executable code)."""
134-
pass
135-
136-
137108
class CoreAgent(CodeAgent):
138109
def __init__(self, observer: MessageObserver, prompt_templates: Dict[str, Any] | None = None, *args, **kwargs):
139110
super().__init__(prompt_templates=prompt_templates, *args, **kwargs)
@@ -179,13 +150,6 @@ def _step_stream(self, memory_step: ActionStep) -> Generator[Any]:
179150
self.observer.add_message(
180151
self.agent_name, ProcessType.PARSE, code_action)
181152

182-
except DisplayCodeOnlyError:
183-
# Only DISPLAY code blocks found - use them as the final answer immediately
184-
self.logger.log_markdown(
185-
content=model_output, title="Display code detected (finalizing answer)", level=LogLevel.INFO)
186-
memory_step.observations = "Display code was provided and returned directly as the final answer."
187-
memory_step.action_output = None
188-
raise FinalAnswerError()
189153
except Exception:
190154
self.logger.log_markdown(
191155
content=model_output, title="AGENT FINAL ANSWER", level=LogLevel.INFO)

0 commit comments

Comments
 (0)