Skip to content

Conversation

@caiy0233
Copy link

When submitting a PR, please confirm the following points and put [x] in the boxes one by one. | 在提出pr时,请确认了以下几点,并逐一使用[x]符号确认勾选。

关联issue:#257

Checklist | 检查项

  • I have read and understood the contributor guidelines. | 我已阅读并理解贡献者指南
  • I have checked for any duplicate features related to this request and communicated with the project maintainers. | 我已检查没有与此请求重复的功能并与项目维护者进行了沟通。
  • I accept the suggestion of the maintainers to make changes to or close this PR. | 我接受此PR配合维护人员的建议进行修改或关闭。
  • I have submitted the test files and can provide screenshots of the test results (required for feature or bug fixes) | 我已经提交了测试文件并可提供测试结果截图(功能修改、BUG修复类PR必须提供,其他按需)
  • I have added or modified the documentation related to this PR | 我已经添加或修改了本次pr对应的文档说明(非必要,根据实际PR内容按需添加)
  • I have added examples and notes if needed | 我已经添加了使用案例代码与文档说明(非必要,根据实际PR内容按需添加)

Please fill in the specific details of this PR: | 请详细填写本次PR的内容:

  • Iterative Optimization Work Pattern: Iterative Prompt Optimization with Batches, Scoring, and Memory

  • 自适应优化:支持批量样本评估、质量评分与持久化记忆的迭代式提示优化
    Summary / 摘要

  • Adds optimization_work_pattern to enable iterative prompt optimization over batched samples, guided by scoring and feedback loops, with persistent memory of past successes and failures.

  • 新增 optimization_work_pattern ,实现基于批量样本的迭代式提示优化,通过评分与反馈闭环驱动,并持久化过去的成功与失败样例,从其中的关键case吸收经验。

Motivation / 动机

  • Single-sample evaluation often produces unstable outcomes; batched evaluation provides more robust signals.
  • Persisting historical insights accelerates convergence and avoids repeated mistakes.
  • 单样例评价不稳定,批量样本能提供稳健信号;历史记忆的持久化可加速收敛并避免重复错误。

Highlights / 特色功能

  • Batched evaluation for stability
    • samples are processed in batches across iterations, producing averaged scores and pass rates that smooth variance.
    • 在多轮迭代中按批评估 samples ,生成平均分与通过率,降低方差提升稳定性。
  • Persistent memory of insights
    • Persists iteration records with before/after prompts to ~/.agentuniverse/optimization_memory.json , enabling reuse of successful patterns and awareness of failure modes.
    • 迭代记录(优化前后提示)持久化到 ~/.agentuniverse/optimization_memory.json ,可复用成功模式并感知常见失败原因。
  • Intelligent history selection and summarization
    • Selects important records by score change, extremes, diversity, and recency; summarizes unselected records to fit context limits.
    • 按分数变化、极端表现、多样性与时序选择关键历史;对未选记录生成摘要以适配上下文。
  • Flexible stop criteria
    • Supports avg_score_threshold and pass_rate_threshold for early stopping when quality meets targets.
    • 支持 avg_score_threshold 与 pass_rate_threshold 作为停止条件,达到目标即提前停止。

Main Functions / 主要功能

  • Iterative optimization over batched samples with configurable iteration counts.
  • Continuous quality scoring and early stopping by thresholds.
  • Feedback-driven prompt improvement with contextual history and prior insights.
  • 基于批量样本的迭代优化,可配置迭代次数;连续质量评分并支持阈值提前停止;基于反馈与历史上下文优化智能体提示词。

Parameters / 参数解释

  • samples :
    • List of strings or dicts. The evaluation/optimization corpus. Required.
    • 字符串或字典列表,用于优化的测试评估样例。必需。
  • initial_prompt :
    • Starting prompt text; if omitted, executing agent’s current instruction applies.
    • 初始提示文本;未提供则使用执行agent的当前prompt。与agent_name_for_optimization提供任一即可。
  • agent_name_for_optimization :
  • Optionally reference another agent to source its instruction as initial_prompt .
  • 可引用其他agent的提示词作为 initial_prompt 来源。与initial_prompt提供任一即可。注意:提示词中如包括多个动态变量,必须与samples中的key一一对应。
  • batch_size :
    • Number of samples per iteration batch. Default 2–3 depending on template/profile.
    • 每轮使用的样例数量。
  • max_iterations :
    • Maximum optimization rounds. Default 5.
    • 最大迭代轮次。默认 5。
  • scoring_standard :
    • Natural language rubric for the scoring agent to evaluate answer quality.
    • 面向评分agent的自然语言评分标准。
  • avg_score_threshold :
    • Stop when average score ≥ threshold.
    • 当平均分 ≥ 阈值时停止。
  • pass_rate_threshold :
    • Stop when fraction of items with score ≥ pass_score ≥ threshold.Optional.
    • 当 score ≥ pass_score 的占比 ≥ 阈值时停止。非必需。
  • pass_score :
    • Minimum score considered “pass”; Optional.
    • 认为“通过”的最低分;非必需。
  • max_history_records :
    • Upper bound of retained detailed history records used in feedback.Optional.
    • 反馈中保留的详细历史记录上限。非必需。
  • max_feedback_chars :
    • Soft limit for feedback payload size; controls summarization to avoid exceeding model context.Optional.
    • 反馈长度软限制;用于摘要以避免超出模型上下文。非必需。

Usage / 使用示例

  • Minimal unittest pattern:
    • examples/sample_apps/optimization_agent_app/intelligence/test/test_optimization_agent.py:1
  • Sample run script with multiple input presets:
    • examples/sample_apps/optimization_agent_app/intelligence/test/optimization_agent.py:16

Validation / 验证

  • Unit test added mirroring peer agent test structure.
    • examples/sample_apps/optimization_agent_app/intelligence/test/test_optimization_agent.py:1
  • Manual run via sample script demonstrates multi-dataset capability and threshold stopping.

Checklist / 自检清单

  • Tests included: yes ( test_optimization_agent.py )
  • Sample app runnable: yes ( examples/sample_apps/optimization_agent_app )

Please provide the path of test files and submit screenshots or files of the test results(fill in as needed): | 请填写测试文件路径并提供测试结果截图或文件(按需填写):
examples/sample_apps/optimization_agent_app/intelligence/test/test_optimization_agent.py
截屏2025-12-12 22 25 53

@caiy0233
Copy link
Author

#257

2 similar comments
@caiy0233
Copy link
Author

#257

@caiy0233
Copy link
Author

#257

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant