✨ Unit test: optimize the display effect of the final answer

WMC001 · WMC001 · commit 24c19adb91d9 · 2025-11-20T11:52:15.000+08:00
diff --git a/backend/prompts/managed_system_prompt_template.yaml b/backend/prompts/managed_system_prompt_template.yaml
@@ -61,8 +61,8 @@ system_prompt: |-
      - 用简单的Python编写代码
      - 遵循python代码规范和python语法
      - 根据格式规范正确调用工具
-     - 考虑到代码执行与展示用户代码的区别，使用'代码：\n```<RUN>\n'开头，并以'```<END_CODE>'表达运行代码，使用'代码：\n```<DISPLAY:语言类型>\n'开头，并以'```<END_CODE>'表达展示代码
-     - 注意运行的代码不会被用户看到，所以如果用户需要看到代码，你需要使用'代码：\n```<DISPLAY:语言类型>\n'开头，并以'```<END_CODE>'表达展示代码。
+     - 考虑到代码执行与展示用户代码的区别，使用'代码：\n```<RUN>\n'开头，并以'```<END_CODE>'表达运行代码，使用'代码：\n```<DISPLAY:语言类型>\n'开头，并以'```<END_DISPLAY_CODE>'表达展示代码
+     - 注意运行的代码不会被用户看到，所以如果用户需要看到代码，你需要使用'代码：\n```<DISPLAY:语言类型>\n'开头，并以'```<END_DISPLAY_CODE>'表达展示代码。
 
   3. 观察结果：
      - 查看代码执行结果
@@ -113,7 +113,7 @@ system_prompt: |-
   {{ constraint }}
 
   ### python代码规范
-  1. 如果认为是需要执行的代码，代码内容以'代码：\n```<RUN>\n'开头，并以'```<END_CODE>'标识符结尾。如果是不需要执行仅用于展示的代码，代码内容以'代码：\n```<DISPLAY:语言类型>\n'开头，并以'```<END_CODE>'标识符结尾，其中语言类型例如python、java、javascript等；
+  1. 如果认为是需要执行的代码，代码内容以'代码：\n```<RUN>\n'开头，并以'```<END_CODE>'标识符结尾。如果是不需要执行仅用于展示的代码，代码内容以'代码：\n```<DISPLAY:语言类型>\n'开头，并以'```<END_DISPLAY_CODE>'标识符结尾，其中语言类型例如python、java、javascript等；
   2. 只使用已定义的变量，变量将在多次调用之间持续保持；
   3. 使用“print()”函数让下一次的模型调用看到对应变量信息；
   4. 正确使用工具的入参，使用关键字参数，不要用字典形式；
diff --git a/backend/prompts/managed_system_prompt_template_en.yaml b/backend/prompts/managed_system_prompt_template_en.yaml
@@ -61,8 +61,8 @@ system_prompt: |-
      - Write code in simple Python
      - Follow Python coding standards and Python syntax
      - Call tools correctly according to format specifications
-     - To distinguish between code execution and displaying user code, use 'Code: \n```<RUN>\n' to start executing code and '```<END_CODE>' to indicate its completion. Use 'Code: \n```<DISPLAY:language_type>\n' to start displaying code and '```<END_CODE>' to indicate its completion.
-     - Note that executed code is not visible to users. If users need to see the code, use 'Code: \n```<DISPLAY:language_type>\n' as the start and '```<END_CODE>' to denote displayed code.
+     - To distinguish between code execution and displaying user code, use 'Code: \n```<RUN>\n' to start executing code and '```<END_CODE>' to indicate its completion. Use 'Code: \n```<DISPLAY:language_type>\n' to start displaying code and '```<END_DISPLAY_CODE>' to indicate its completion.
+     - Note that executed code is not visible to users. If users need to see the code, use 'Code: \n```<DISPLAY:language_type>\n' as the start and '```<END_DISPLAY_CODE>' to denote displayed code.
 
   3. Observe Results:
      - View code execution results
@@ -113,7 +113,7 @@ system_prompt: |-
   {{ constraint }}
 
   ### Python Code Specifications
-  1. If it is considered to be code that needs to be executed, the code content begins with 'code: \n```<RUN>\n' and ends with '```<END_CODE>'. If the code does not need to be executed for display only, the code content begins with 'code:\n```<DISPLAY:language_type>\n', and ends with '```<END_CODE>', where language_type can be python, java, javascript, etc;
+  1. If it is considered to be code that needs to be executed, the code content begins with 'code: \n```<RUN>\n' and ends with '```<END_CODE>'. If the code does not need to be executed for display only, the code content begins with 'code:\n```<DISPLAY:language_type>\n', and ends with '```<END_DISPLAY_CODE>', where language_type can be python, java, javascript, etc;
   2. Only use defined variables, variables will persist between multiple calls;
   3. Use "print()" function to let the next model call see corresponding variable information;
   4. Use tool input parameters correctly, use keyword arguments, not dictionary format;
diff --git a/backend/prompts/manager_system_prompt_template.yaml b/backend/prompts/manager_system_prompt_template.yaml
@@ -62,8 +62,8 @@ system_prompt: |-
      - 用简单的Python编写代码
      - 遵循python代码规范和python语法
      - 正确调用工具或助手解决问题
-     - 考虑到代码执行与展示用户代码的区别，使用'代码：\n```<RUN>\n'开头，并以'```<END_CODE>'表达运行代码，使用'代码：\n```<DISPLAY:语言类型>\n'开头，并以'```<END_CODE>'表达展示代码
-     - 注意运行的代码不会被用户看到，所以如果用户需要看到代码，你需要使用'代码：\n```<DISPLAY:语言类型>\n'开头，并以'```<END_CODE>'表达展示代码。
+     - 考虑到代码执行与展示用户代码的区别，使用'代码：\n```<RUN>\n'开头，并以'```<END_CODE>'表达运行代码，使用'代码：\n```<DISPLAY:语言类型>\n'开头，并以'```<END_DISPLAY_CODE>'表达展示代码
+     - 注意运行的代码不会被用户看到，所以如果用户需要看到代码，你需要使用'代码：\n```<DISPLAY:语言类型>\n'开头，并以'```<END_DISPLAY_CODE>'表达展示代码。
 
   3. 观察结果：
      - 查看代码执行结果
@@ -141,7 +141,7 @@ system_prompt: |-
   {{ constraint }}
   
   ### python代码规范
-  1. 如果认为是需要执行的代码，代码内容以'代码：\n```<RUN>\n'开头，并以'```<END_CODE>'标识符结尾。如果是不需要执行仅用于展示的代码，代码内容以'代码：\n```<DISPLAY:语言类型>\n'开头，并以'```<END_CODE>'标识符结尾，其中语言类型例如python、java、javascript等；
+  1. 如果认为是需要执行的代码，代码内容以'代码：\n```<RUN>\n'开头，并以'```<END_CODE>'标识符结尾。如果是不需要执行仅用于展示的代码，代码内容以'代码：\n```<DISPLAY:语言类型>\n'开头，并以'```<END_DISPLAY_CODE>'标识符结尾，其中语言类型例如python、java、javascript等；
   2. 只使用已定义的变量，变量将在多次调用之间持续保持；
   3. 使用“print()”函数让下一次的模型调用看到对应变量信息；
   4. 正确使用工具/助手的入参，使用关键字参数，不要用字典形式；
diff --git a/backend/prompts/manager_system_prompt_template_en.yaml b/backend/prompts/manager_system_prompt_template_en.yaml
@@ -62,8 +62,8 @@ system_prompt: |-
      - Write code in simple Python
      - Follow Python coding standards and Python syntax
      - Correctly call tools or agents to solve problems
-     - To distinguish between code execution and displaying user code, use 'Code: \n```<RUN>\n' to start executing code and '```<END_CODE>' to indicate its completion. Use 'Code: \n```<DISPLAY:language_type>\n' to start displaying code and '```<END_CODE>' to indicate its completion.
-     - Note that executed code is not visible to users. If users need to see the code, use 'Code: \n```<DISPLAY:language_type>\n' as the start and '```<END_CODE>' to denote displayed code.
+     - To distinguish between code execution and displaying user code, use 'Code: \n```<RUN>\n' to start executing code and '```<END_CODE>' to indicate its completion. Use 'Code: \n```<DISPLAY:language_type>\n' to start displaying code and '```<END_DISPLAY_CODE>' to indicate its completion.
+     - Note that executed code is not visible to users. If users need to see the code, use 'Code: \n```<DISPLAY:language_type>\n' as the start and '```<END_DISPLAY_CODE>' to denote displayed code.
 
   3. Observe Results:
      - View code execution results
@@ -141,7 +141,7 @@ system_prompt: |-
   {{ constraint }}
   
   ### Python Code Specifications
-  1. If it is considered to be code that needs to be executed, the code content begins with 'code: \n```<RUN>\n' and ends with '```<END_CODE>'. If the code does not need to be executed for display only, the code content begins with 'code: \n```<DISPLAY:language_type>\n', and ends with '```<END_CODE>', where language_type can be python, java, javascript, etc;
+  1. If it is considered to be code that needs to be executed, the code content begins with 'code: \n```<RUN>\n' and ends with '```<END_CODE>'. If the code does not need to be executed for display only, the code content begins with 'code: \n```<DISPLAY:language_type>\n', and ends with '```<END_DISPLAY_CODE>', where language_type can be python, java, javascript, etc;
   2. Only use defined variables, variables will persist between multiple calls;
   3. Use "print()" function to let the next model call see corresponding variable information;
   4. Use tool/agent input parameters correctly, use keyword arguments, not dictionary format;
diff --git a/backend/prompts/utils/prompt_generate.yaml b/backend/prompts/utils/prompt_generate.yaml
@@ -52,16 +52,16 @@ FEW_SHOTS_SYSTEM_PROMPT: |-
      - 用简单的Python编写代码
      - 遵循python代码规范和python语法
      - 根据格式规范正确调用工具/助手
-     - 考虑到代码执行与展示用户代码的区别，使用'代码：\n```<RUN>\n'开头，并以'```<END_CODE>'表达运行代码，使用'代码：\n```<DISPLAY:语言类型>\n'开头，并以'```<END_CODE>'表达展示代码
-     - 注意运行的代码不会被用户看到，所以如果用户需要看到代码，你需要使用'代码：\n```<DISPLAY:语言类型>\n'开头，并以'```<END_CODE>'表达展示代码。
+     - 考虑到代码执行与展示用户代码的区别，使用'代码：\n```<RUN>\n'开头，并以'```<END_CODE>'表达运行代码，使用'代码：\n```<DISPLAY:语言类型>\n'开头，并以'```<END_DISPLAY_CODE>'表达展示代码
+     - 注意运行的代码不会被用户看到，所以如果用户需要看到代码，你需要使用'代码：\n```<DISPLAY:语言类型>\n'开头，并以'```<END_DISPLAY_CODE>'表达展示代码。
 
   3. 观察结果：
      - 查看代码执行结果
   
   在思考结束后，当Agent认为可以回答用户问题，那么可以不生成代码，直接生成最终回答给到用户并停止循环。
 
   ### python代码规范
-  1. 如果认为是需要执行的代码，代码内容以'代码：\n```<RUN>\n'开头，并以'```<END_CODE>'标识符结尾。如果是不需要执行仅用于展示的代码，代码内容以'代码：\n```<DISPLAY:语言类型>\n'开头，并以'```<END_CODE>'标识符结尾，其中语言类型例如python、java、javascript等；
+  1. 如果认为是需要执行的代码，代码内容以'代码：\n```<RUN>\n'开头，并以'```<END_CODE>'标识符结尾。如果是不需要执行仅用于展示的代码，代码内容以'代码：\n```<DISPLAY:语言类型>\n'开头，并以'```<END_DISPLAY_CODE>'标识符结尾，其中语言类型例如python、java、javascript等；
   2. 只使用已定义的变量，变量将在多次调用之间持续保持；
   3. 使用“print()”函数让下一次的模型调用看到对应变量信息；
   4. 正确使用工具/助手的入参，使用关键字参数，不要用字典形式；
@@ -160,11 +160,12 @@ FEW_SHOTS_SYSTEM_PROMPT: |-
     middle = [x for x in arr if x == pivot]
     right = [x for x in arr if x > pivot]
     return quick_sort(left) + middle + quick_sort(right)
-  ```<END_CODE>
+  ```<END_DISPLAY_CODE>
   观察结果：快速排序的python代码。
   
   思考：我已经获得了快速排序的python代码，现在我将生成最终回答。
   快速排序的python代码如下：
+  代码：
   ```<DISPLAY:python>
   def quick_sort(arr):
     if len(arr) <= 1:
@@ -174,7 +175,7 @@ FEW_SHOTS_SYSTEM_PROMPT: |-
     middle = [x for x in arr if x == pivot]
     right = [x for x in arr if x > pivot]
     return quick_sort(left) + middle + quick_sort(right)
-  ```<END_CODE>
+  ```<END_DISPLAY_CODE>
 
   ---
 
diff --git a/backend/prompts/utils/prompt_generate_en.yaml b/backend/prompts/utils/prompt_generate_en.yaml
@@ -53,16 +53,16 @@ FEW_SHOTS_SYSTEM_PROMPT: |-
      - Write code in simple Python
      - Follow Python coding standards and Python syntax
      - Call tools/assistants correctly according to format specifications
-     - To distinguish between code execution and displaying user code, use 'Code: \n```<RUN>\n' to start executing code and '```<END_CODE>' to indicate its completion. Use 'Code: \n```<DISPLAY:language_type>\n' to start displaying code and '```<END_CODE>' to indicate its completion.
-     - Note that executed code is not visible to users. If users need to see the code, use 'Code: \n```<DISPLAY:language_type>\n' as the start and '```<END_CODE>' to denote displayed code.
+     - To distinguish between code execution and displaying user code, use 'Code: \n```<RUN>\n' to start executing code and '```<END_CODE>' to indicate its completion. Use 'Code: \n```<DISPLAY:language_type>\n' to start displaying code and '```<END_DISPLAY_CODE>' to indicate its completion.
+     - Note that executed code is not visible to users. If users need to see the code, use 'Code: \n```<DISPLAY:language_type>\n' as the start and '```<END_DISPLAY_CODE>' to denote displayed code.
 
   3. Observe Results:
      - View code execution results
   
   After thinking, when you believe you can answer the user's question, you can generate a final answer directly to the user without generating code and stop the loop.
   
   ### Python Code Specifications
-  1. If it is considered to be code that needs to be executed, the code content begins with 'Code:\n```<RUN>\n' and ends with '```<END_CODE>'. If the code does not need to be executed for display only, the code content begins with 'Code:\n```<DISPLAY:language_type>\n', and ends with '```<END_CODE>', where language_type can be python, java, javascript, etc.;
+  1. If it is considered to be code that needs to be executed, the code content begins with 'Code:\n```<RUN>\n' and ends with '```<END_CODE>'. If the code does not need to be executed for display only, the code content begins with 'Code:\n```<DISPLAY:language_type>\n', and ends with '```<END_DISPLAY_CODE>', where language_type can be python, java, javascript, etc.;
   2. Only use defined variables, variables will persist between multiple calls;
   3. Use "print()" function to let the next model call see corresponding variable information;
   4. Use tool/assistant input parameters correctly, use keyword arguments, not dictionary format;
@@ -158,7 +158,7 @@ FEW_SHOTS_SYSTEM_PROMPT: |-
     middle = [x for x in arr if x == pivot]
     right = [x for x in arr if x > pivot]
     return quick_sort(left) + middle + quick_sort(right)
-  ```<END_CODE>
+  ```<END_DISPLAY_CODE>
   Observe Results: The Python quick sort code.
 
   Think: I have obtained the Python quick sort code, now I will generate the final answer.
@@ -172,7 +172,7 @@ FEW_SHOTS_SYSTEM_PROMPT: |-
     middle = [x for x in arr if x == pivot]
     right = [x for x in arr if x > pivot]
     return quick_sort(left) + middle + quick_sort(right)
-  ```<END_CODE>
+  ```<END_DISPLAY_CODE>
 
   ---
 
diff --git a/sdk/nexent/core/agents/core_agent.py b/sdk/nexent/core/agents/core_agent.py
@@ -2,7 +2,6 @@
 import ast
 import time
 import threading
-import logging
 from textwrap import dedent
 from typing import Any, Optional, List, Dict
 from collections.abc import Generator
@@ -24,7 +23,6 @@
 if TYPE_CHECKING:
     import PIL.Image
 
-logger = logging.getLogger(__name__)
 
 def parse_code_blobs(text: str) -> str:
     """Extract code blocs from the LLM's output for execution.
@@ -40,7 +38,6 @@ def parse_code_blobs(text: str) -> str:
 
     Raises:
         ValueError: If no valid code block is found in the text.
-        DisplayCodeOnlyError: If only DISPLAY code blocks are found (no executable code).
     """
     # First try to match the new <RUN> format for execution
     # <END_CODE> is optional - match both with and without it
@@ -63,28 +60,6 @@ def parse_code_blobs(text: str) -> str:
     except SyntaxError:
         pass
 
-    # Check if there are DISPLAY code blocks (but no executable code)
-    # Only raise DisplayCodeOnlyError if:
-    # 1. There are DISPLAY code blocks present
-    # 2. AND there are NO executable code blocks (RUN or py/python) anywhere in the text
-    # This ensures we don't skip execution if executable code appears later in the output
-    display_pattern = r"```<DISPLAY:\w+>\s*\n(.*?)\n```(?:<END_CODE>)?"
-    display_matches = re.findall(display_pattern, text, re.DOTALL)
-    has_display = bool(display_matches)
-    
-    # Double-check: ensure there are NO executable code blocks anywhere
-    # (This is a safety check - we should have already checked above, but be explicit)
-    run_check = bool(re.search(r"```<RUN>\s*\n(.*?)\n```(?:<END_CODE>)?", text, re.DOTALL))
-    py_check = bool(re.search(r"```(?:py|python)\s*\n(.*?)\n```", text, re.DOTALL))
-    has_executable = run_check or py_check
-    
-    logger.info(f"[parse_code_blobs] Code block detection: has_display={has_display} (found {len(display_matches)} DISPLAY blocks), "
-             f"has_executable={has_executable} (RUN={run_check}, py/python={py_check})")
-    
-    if has_display and not has_executable:
-        raise DisplayCodeOnlyError()
-
-    logger.info("[parse_code_blobs] No valid executable code block found, raising ValueError")
     raise ValueError(
         dedent(
             f"""
@@ -117,6 +92,7 @@ def convert_code_format(text):
 
     # Restore <END_CODE> if it was affected by the above replacement
     text = text.replace("```<END_CODE>", "```")
+    text = text.replace("```<END_DISPLAY_CODE>", "```")
 
     # Clean up any remaining ```< patterns
     text = text.replace("```<", "```")
@@ -129,11 +105,6 @@ class FinalAnswerError(Exception):
     pass
 
 
-class DisplayCodeOnlyError(Exception):
-    """Raised when only DISPLAY code blocks are found (no executable code)."""
-    pass
-
-
 class CoreAgent(CodeAgent):
     def __init__(self, observer: MessageObserver, prompt_templates: Dict[str, Any] | None = None, *args, **kwargs):
         super().__init__(prompt_templates=prompt_templates, *args, **kwargs)
@@ -179,13 +150,6 @@ def _step_stream(self, memory_step: ActionStep) -> Generator[Any]:
             self.observer.add_message(
                 self.agent_name, ProcessType.PARSE, code_action)
 
-        except DisplayCodeOnlyError:
-            # Only DISPLAY code blocks found - use them as the final answer immediately
-            self.logger.log_markdown(
-                content=model_output, title="Display code detected (finalizing answer)", level=LogLevel.INFO)
-            memory_step.observations = "Display code was provided and returned directly as the final answer."
-            memory_step.action_output = None
-            raise FinalAnswerError()
         except Exception:
             self.logger.log_markdown(
                 content=model_output, title="AGENT FINAL ANSWER", level=LogLevel.INFO)
diff --git a/test/sdk/core/agents/test_core_agent.py b/test/sdk/core/agents/test_core_agent.py