Skip to content

Commit d20cce0

Browse files
Deep Finance Update with New Judge (#7)
* feat(finworld): Added AgentScope learning protocol and OpenJudge evaluation functionality to the FinWorld task. - Added the ExampleAgentScopeLearnProtocol class to implement the AgentScope execution flow for multi-turn interactions. - Integrated semaphore control to manage the parallelism of environment calls, improving environment stepping performance. - Implemented a mechanism for detecting context overflows and quickly terminating during environment interactions to prevent blocking. - Added a finworld.yaml configuration file to define project training and rollout parameters. - Added the FinWorldJudgeByOpenJudge class, integrating multiple evaluators including RM Gallery and OpenJudge (@Haoran). - Implemented a mechanism for converting task output, asynchronous calls, and retrying to ensure evaluation stability. - Weight normalization manages the contributions of each evaluator, merging them to calculate the final reward and success determination. * Precommit fix (#4) * fix end of files * autoflake import fix * add mypy check * fix test bench import * refactor(finworld): Replace agent protocol and unify configuration updates - Renamed ExampleAgentScopeLearnProtocol to ExampleDeepResearchProtocol and modified the execute method signature. - Unified the parameter name of the model tuner to `tuner` and its related attribute references. - Optimized the multi-turn interaction step configuration, changing it to use `tuner.config.ajet.rollout.multi_turn.max_steps`. - Modified the context overflow judgment logic to prevent tool call blocking. - Updated the finworld.yaml configuration, replacing astune with ajet-related configurations, and adjusted the workflow protocol and environment parameters. - Modified the default environment variable values ​​and log saving paths in finworld_judge.py. - Added and improved multi-machine and single-machine startup scripts, supporting dynamic generation of MCP configuration and environment variable loading. - Added the finworld_single.yaml template to adapt to single-machine training configurations. - Adjusted the key reference for multi-turn step configuration in ma_deepresearch.py, using the ajet configuration path. * feat(finworld): Added FinWorld training environment configuration scripts and templates - Added bash startup scripts for multi-machine, multi-GPU training, supporting dynamic configuration generation and environment variable import. - Implemented training configuration file templates, supporting automatic injection of various weight parameters and model paths. - Adjusted the default request timeout of EnvClient from 30 seconds to 300 seconds to accommodate long training requests. - Added a new finworld example directory and related documentation, improving the example project structure. * refactor(utils): Remove unused extract and compute functions `extract_tool_stats_from_cmts` * refactor(finworld): Replace the old model with OpenJudge, update evaluation configuration and scripts - Replaced model initialization in FinWorldJudgeByOpenJudge with the `_init_openjudge_model` method - Read Judge model parameters from the configuration file first, using environment variables as a fallback - Optimized RM Gallery initialization, using configuration-first logic, and improved exception stack trace printing - Cleaned up and removed the old `_init_model` singleton method and related code - Updated the example startup script `ajet_finworld.sh`, adding OPENJUDGE_LLM and RM_LLM configurations - Modified YAML templates and configuration files to unify the structure and field naming of Judge configuration items - Deleted the outdated `cc_rm4_res2cit2fai2_30b.sh` script - Adjusted the `env_service` startup path to improve environment activation compatibility - Adjusted script log output format and content to enhance the clarity of configuration parameter printing * feat(task_reader): Support data reading of type jsonl_with_env_service - Added the jsonl_with_env_service type, which allows loading data from jsonl files while calling tools via env_service. - Extended ResourceKeeper to handle the creation and release logic of environment instances for jsonl_with_env_service. - Maintained the env_service type logic, calling create_instance to register instances and initializing them using init_messages from the jsonl file. - Added an example protocol, ExampleDeepResearchProtocol, to implement multi-turn interaction and environment call coordination. - Provided training scripts and YAML configuration templates for finworld, supporting the jsonl_with_env_service mode training environment. - Optimized scripts to support multi-node multi-GPU training, including environment variables and Ray cluster configuration. * feat(core): add finworld task reader support to framework * feat(finworld): implement specialized data reader and openjudge-based grading logic * refactor(finworld): optimize configuration templates and prompt engineering * chore(finworld): update launch scripts and add variant experiment scripts * feat(finworld): Added support for multi-machine, multi-GPU training scripts and configuration templates: * chore(git): ignore finworld/yaml/* * fix(metrics): Fix and enhance the compatibility and debugging output of the metrics update logic - Modified the `update_metrics` function, adding a `prefix` parameter to distinguish between training and validation metrics. - Adjusted the data source for extracting `reward_stats` and `tool_stats`, migrating from `workflow_metadata` to `log_metrics`. - Added debug printing to output the `log_metrics` content and metric key names at key steps for easier troubleshooting. - Used the appropriate prefix when calling `update_metrics` in `trainer_verl.py`, and added multiple debug prints. - Modified `WorkflowOutput` to place `tool_stats` and `reward_stats` into the `log_metrics` field. - Removed redundant and deprecated code for extracting `reward_stats` and calculation functions. - Added debug information output to the `finworld` and `finworld_judge` modules to track log metrics and scoring data. * fix(metrics): Remove debug prints and synchronize reward statistics - Removed debug print statements before and after the `update_metrics` call in `trainer_verl.py` - Removed debug print statements related to the `log_metrics` key in `finworld.py` - Removed debug print statements before updating `metadata_stats` in `finworld_judge.py` - Added logic in `general_runner.py` to synchronize `reward_stats` from `metadata` to `log_metrics` after the judge calculation - Cleaned up debug print statements within `update_metrics` in `metric_helper`, improving code readability. * chore: "Stop tracking existing yaml files in tutorial directory" * fix(task_runner): Synchronize reward_stats to log_metrics feat(tutorial): Added FinWorld multi-machine multi-GPU training startup script * refactor(script): Refactored the finworld training script, integrating configuration and startup processes. * Refactor(deep_finance): Replace and remove finworld-related implementations - Switched the example directory from example_finworld to example_deep_finance - Modified startup parameters and logic to support deep_finance, replacing the finworld option - Replaced finworld_reader with deep_finance_reader in the task reader - Adjusted environment client configuration in resource management, using deep_finance instead of finworld-related checks - Updated reward metric tool documentation to support deep_finance - Deleted finworld-related configuration files, scripts, code, and evaluation modules, cleaning up leftover files and scripts - Replaced the keyword "finworld" with "deep_finance" in comments and logs * refactor(deepfinance): Rename and unify DeepFinance module and config references - Replace all "finworld" and "deep_finance" names with the unified "deepfinance" format. - Modify command-line arguments to `--with-deepfinance` for consistency. - Adjust the class name in `task_reader` from `deep_financeReader` to `DeepFinanceReader`. - Update the documentation description and file name of the `metric_helper` module to DeepFinance. - Modify environment variables and configuration paths in the example script `deep_finance.sh` to use the `DEEPFINANCE` prefix. - Update `judge_protocol` to `DeepFinanceJudgeByOpenJudge` in the `deep_finance.yaml` configuration. - Refactor the `FinWorldJudgeByOpenJudge` class in `deep_finance_judge.py` to `DeepFinanceJudgeByOpenJudge`. - Rename the `FinworldReader` class in `deep_finance_reader.py` to `DeepFinanceReader`. - Modify the debug log identifier and corresponding environment variable name to `DEEPFINANCE_DEBUG`. - Update the evaluation protocol in the `deep_finance_template.yaml` template to `DeepFinanceJudgeByOpenJudge`. - Ensure that internal references and comments in all modules are updated to use DeepFinance and deepfinance-related names. * refactor(tutorial): Optimize dynamic generation logic for configuration file paths * fix(deep_finance): argparse: with-deepfinance * fix(tutorial): Fixed issues with multi-machine training environment variable settings * fix(env): Corrected the assignment logic for reward and info when returning environment state - Corrected the `env_output` return value structure in `BaseGymEnv` to ensure correct assignment of `reward` and `info` fields. - Removed `RefJudge` and `StructureJudge` related metric calculations and statistics from `reward_metric_helper`. - Cleaned up redundant code in `reward_metric_helper`, removing invalid comments and statistical items. - Modified `save_trajectory_as_json` to always print trajectory saving confirmation information. - Corrected log comments in `example_deep_finance` to avoid meaningless log output. - Added the `save_trajectory_as_json_file` configuration item to `deep_finance_template.yaml` to support trajectory saving functionality. * chore(config): Update example_deep_finance configuration and clean up files - Added a new ignore rule for config file paths in .gitignore - Deleted the automatically generated mcp_finance_tool_generated.json file in example_deep_finance - Refactored the deep_finance.yaml configuration file, adjusting project and experiment names - Reorganized Judge configuration, clarifying openjudge_llm and rm_llm models - Optimized model paths and training parameter configurations, adding parallel and batch processing settings - Adjusted data reading methods and training/validation set path placeholders - Reduced GPU memory usage ratio for rollout to 0.8 - Updated the default save directory path for the trainer to a placeholder variable - Cleaned up unused and commented-out code to improve configuration file conciseness * Refactor(metric): Optimize tool metric calculation and data saving logic - Corrected the data source field for timeline data used during trajectory saving. - Removed redundant fields in tool execution time, cache hit rate, and error rate statistics. - Updated .gitignore to add ignore rules for the example script directory. - Removed unnecessary debugging information from logs to reduce log noise. - Adjusted log printing in the multi-round interaction execution process to simplify output content. - Streamlined log code for environment observation and termination checks to improve code readability. * fix(metric_helper): fix tool cache metric * fix little bug * fix(utils): Suppress httpx AsyncClient.aclose() exception warnings * comments to english * feat: 支持服务名称前缀功能 - 在 launcher 中添加 --prefix 参数支持 - 在 pty_launch 函数中实现前缀逻辑 - 更新 deep_finance.sh 脚本以使用前缀功能 - 允许在同一环境中运行多个服务实例 * fix: 改进 MultiAgent 消息内容解析逻辑 - 支持 tool_result 格式的消息内容块 - 改进非文本内容的处理逻辑,继续处理其他项而非跳过整个消息 - 添加 tool_use 类型的处理(跳过,因为已通过 tool_calls 字段处理) - 优化代码结构和注释,提高可读性 * fix: 优化 DeepFinance 判断逻辑和配置 - 修复 tool_stats 提取逻辑,从 log_metrics 中正确获取数据 - 添加惩罚项调试信息输出 - 启用 tool calls 功能(force_disable_toolcalls: False) - 确保奖励计算准确性 * chore(deps): bump agentscope from 1.0.7 to 1.0.8 * fix(metric_helper): correct trajectory save path and add tool call metric - Change trajectory save directory from "ctx_trackers" to "trajectory" to organize files better - Add recording of tool call counts alongside error rates in tool metrics - Update experiment suffix in deep finance example script for clearer naming convention * revise message parsing * fix(metric_helper): update openjudge graders list in reward metric helper * feat(deep_finance): replace OpenJudge graders with PresentationQualityGrader - Remove legacy graders and integrate PresentationQualityGrader and GroundingGrader - Update grader weights and disable unused graders in config and code - Simplify grader configuration creation with new mappers for report content and traj - Refactor DeepFinanceJudgeByOpenJudge to support new grading scheme - Add PresentationQualityGrader implementation with strict JSON output format - Include utilities for JSON parsing and validation in presentation quality grader - Add prompt templates for presentation quality grading criteria and instructions - Provide example script to run PresentationQualityGrader with OpenAIChatModel - Add traj_adapter utilities to normalize and extract user query and final report - Update YAML template to replace old grader weights with presentation quality weight - Create init files to expose PresentationQualityGrader in judge package * feat(grounding): implement grounding grader for citation compliance evaluation - add GroundingGrader class to evaluate citation coverage and truthfulness based on dialogue traj - provide default OpenAIChatModel creation with deterministic options - implement prompt construction and JSON parsing utilities for model interaction - calculate scores including coverage, grounding, and invalid citation penalties - add detailed json_utils module for strict JSON extraction and validation - introduce prompt templates defining citation auditing rules and user prompts - supply reference.py with related grounding evaluation logic and RefJudgeEvaluator class - create __init__.py to expose GroundingGrader module - add presentation_quality module __init__.py with PresentationQualityGrader export * fix(deep_finance_judge): add debug logging for OpenJudge evaluation process * feat(deep_finance): enhance reward metadata and zero score debugging - Add populate_reward_metadata_from_stats to copy reward stats into reward metadata - Populate reward metadata in GeneralRunner if reward_stats present in workflow output - Refine compute_reward_metrics with updated OpenJudge graders: presentation_quality, grounding, planning - Add _save_zero_score_debug method in DeepFinanceJudgeByOpenJudge to save debug info for zero grader scores - Remove deprecated RewardStats usage in deep_finance_judge - Update judge __init__ to export GroundingGrader alongside PresentationQualityGrader - Clean up debug print statements and logging in deep_finance_judge.py - Update .gitignore to exclude prepare_data and judge/analytical_sufficiency folders in example_deep_finance tutorial * feat(presentation_quality): upgrade grading to 1/3/5 scoring system with markdown cleanup - Add function to strip markdown code block fences in grounding and presentation_quality modules - Change presentation quality grader to score each of 8 criteria on a 1/3/5 scale instead of pass/fail - Normalize total score by dividing sum of item scores by max (40), improving granularity - Update reasoning output to list lowest scoring items with notes for focused feedback - Revise presentation quality prompt to reflect new 1/3/5 scoring rubric with detailed instructions - Adjust JSON output schema accordingly, replacing boolean pass with numeric score fields - Add get_score utility in JSON utils to extract and validate scores from graded items - Clean report input by removing markdown fences before grading to avoid markup noise - Add grounding weight configuration in YAML template for improved modular judge weighting * chore(config): update experiment suffix, prefix and reward weights in deep_finance.sh * fix(deep_finance): update environment variables and training launch options * chore(config): parameterize deep finance training configuration * chore(config): update experiment suffix, prefix, and weight parameters * fix(example_deep_finance): update dynamic config file generation path * refactor(judge): remove deprecated presentation quality script --------- Co-authored-by: binary-husky <[email protected]> Co-authored-by: Qingxu Fu <[email protected]> Co-authored-by: qingxu.fu <[email protected]>
1 parent df4a593 commit d20cce0

File tree

21 files changed

+1686
-210
lines changed

21 files changed

+1686
-210
lines changed

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -158,6 +158,9 @@ tutorial/example_deep_finance/yaml/*
158158
tutorial/example_deep_finance/config/*
159159
tutorial/example_deep_finance/scripts/*
160160
flash_attn-2.8.*.whl
161+
tutorial/example_deep_finance/prepare_data/*
162+
tutorial/example_deep_finance/judge/analytical_sufficiency/*
163+
161164
.dockerignore
162165
benchmark_datasets
163166
modelscope_cache

ajet/context_tracker/multiagent_tracking.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,18 @@ def extract_text_content_from_content_dict(self, msg):
8282
# },
8383
# ],
8484
# }
85+
# or tool_result format?? not observed yet:
86+
# msg = {
87+
# "role": "tool",
88+
# "content": [
89+
# {
90+
# "type": "tool_result",
91+
# "id": "call_xxx",
92+
# "output": "tool output content",
93+
# "name": "tool_name"
94+
# },
95+
# ],
96+
# }
8597

8698

8799
str_content = ""

ajet/task_runner/general_runner.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
from ajet.schema.trajectory import Reward
1010
from ajet.task_runner.base_runner import BaseAgentRunner
1111
from ajet.utils.dynamic_import import dynamic_import
12+
from ajet.utils.metric_helper.reward_metric_helper import populate_reward_metadata_from_stats
1213

1314

1415
class GeneralRunner(BaseAgentRunner):
@@ -73,6 +74,10 @@ def execute(self, workflow_task: WorkflowTask) -> BaseContextTracker:
7374
madness=0,
7475
description="",
7576
)
77+
78+
# Populate reward metadata with deep_finance reward stats if available
79+
if "reward_stats" in workflow_output.metadata:
80+
populate_reward_metadata_from_stats(reward, workflow_output.metadata["reward_stats"])
7681
context_tracker.process_reward(reward)
7782
# generate token before merging
7883
context_tracker.group_merge()

ajet/utils/metric_helper/reward_metric_helper.py

Lines changed: 24 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -11,9 +11,12 @@
1111
- judge_time/ Judge time consumption statistics
1212
"""
1313

14-
from typing import List, Dict, Any
14+
from typing import List, Dict, Any, TYPE_CHECKING
1515
import numpy as np
1616

17+
if TYPE_CHECKING:
18+
from ajet.schema.trajectory import Reward
19+
1720

1821
def extract_reward_stats_from_trajectories(trajectories: List[Any]) -> List[Dict[str, Any]]:
1922
"""
@@ -72,22 +75,15 @@ def compute_reward_metrics(reward_stats_list: List[Dict[str, Any]], prefix: str
7275
metrics[f"{prefix}rewards/penalty_count"] = len(non_zero_penalties)
7376
metrics[f"{prefix}rewards/penalty_rate"] = len(non_zero_penalties) / n * 100 if n > 0 else 0.0
7477

75-
# ========== Detect OpenJudge Usage ==========
78+
# ========== OpenJudge Metrics (PresentationQualityGrader, GroundingGrader) ==========
7679
openjudge_enabled_count = sum(1 for rs in reward_stats_list if rs.get('openjudge_enabled', False))
7780

7881
if openjudge_enabled_count > 0:
79-
# ========== OpenJudge Metrics ==========
80-
81-
# Dynamically extract OpenJudge grader fields
82-
# Currently supported graders: report_resolution, trajectory_faithfulness,
83-
# rubrics_performance, trajectory_comprehensive, information_gain, action_loop
82+
# OpenJudge graders: presentation_quality, grounding
8483
openjudge_graders = [
85-
"report_resolution",
86-
"trajectory_faithfulness",
87-
"rubrics_performance",
88-
"trajectory_comprehensive",
89-
"information_gain",
90-
"action_loop",
84+
"presentation_quality",
85+
"grounding",
86+
"planning"
9187
]
9288

9389
for grader_name in openjudge_graders:
@@ -151,3 +147,18 @@ def compute_reward_metrics_from_trajectories(trajectories: List[Any], prefix: st
151147
reward_stats_list = extract_reward_stats_from_trajectories(trajectories)
152148
return compute_reward_metrics(reward_stats_list, prefix=prefix)
153149

150+
151+
def populate_reward_metadata_from_stats(reward: "Reward", reward_stats: Dict[str, Any]) -> None:
152+
"""
153+
Populate Reward.metadata with all reward statistics.
154+
155+
Args:
156+
reward: The Reward object to populate
157+
reward_stats: The reward_stats dictionary from judge
158+
"""
159+
if not reward_stats:
160+
return
161+
162+
# Directly copy all reward_stats into metadata
163+
reward.metadata.update(reward_stats)
164+
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
# tutorial/example_deep_finance package

tutorial/example_deep_finance/deep_finance.sh

Lines changed: 20 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,29 +1,29 @@
11
#!/bin/bash
2-
set -e
2+
set -e
33
#===============================================================================
44
# 1. 配置区域 - 用户只需修改这里
55
#===============================================================================
6-
SUFFIX="deep_finance" # 实验后缀,影响所有日志和实验名称
7-
PREFIX="open" # 实验前缀,影响日志和实验所在文件夹
6+
SUFFIX="newjudge" # 实验后缀,影响所有日志和实验名称
7+
PREFIX="ajet_newjudge" # 实验前缀,影响日志和实验所在文件夹
88

99
# OpenJudge 模型配置
1010
OPENJUDGE_LLM='qwen-flash' # OpenJudge 评分模型
1111
RM_LLM='qwen-max' # RM Gallery 评分模型
1212
JUDGE_CONCURRENCY=10
1313

1414
# 奖励权重配置
15-
RM_WEIGHT=0.4
16-
CITATION_AUDIT_WEIGHT=0.2
17-
REPORT_RESOLUTION_WEIGHT=0.2
18-
TRAJECTORY_FAITHFULNESS_WEIGHT=0.2
15+
RM_WEIGHT=0.5
16+
PRESENTATION_QUALITY_WEIGHT=0.25
17+
GROUNDING_WEIGHT=0.25
1918

2019
# 训练参数配置
2120
NUM_REPEAT=4 # group size,每个query rollout NUM_REPEAT次
2221
TRAIN_BATCH_SIZE=32 # 训练batchsize
2322
NUM_STEPS=6 # 每个样本step轮数
2423
DEEPFINANCE_TOOL_RESULT_MAX_CHARS=10000
2524

26-
# 主目录
25+
# 主目录(需要更改)
26+
export AJET_ROOT="/mnt/data_cpfs/taoshuchang.tsc/deepresearch/AgentJet_new"
2727

2828
NNODES=${WORLD_SIZE}
2929

@@ -46,7 +46,7 @@ fi
4646
# 2. 动态生成配置文件 (从yaml template生成yaml)
4747
#===============================================================================
4848
# 修改:配置文件生成路径,现在动态生成到 yaml 目录下
49-
CONFIG_TEMPLATE="tutorial/example_deep_finance/yaml_template/deep_finance_template.yaml"
49+
CONFIG_TEMPLATE="tutorial/example_deep_finance/deep_finance.yaml"
5050
CONFIG_FILE="${AJET_ROOT}/tutorial/example_deep_finance/yaml/${SUFFIX}.yaml"
5151
mkdir -p $(dirname ${CONFIG_FILE})
5252

@@ -55,12 +55,11 @@ sed -e "s|{{SUFFIX}}|${SUFFIX}|g" \
5555
-e "s|{{MODEL_PATH}}|${MODEL_PATH}|g" \
5656
-e "s|{{NNODES}}|${NNODES}|g" \
5757
-e "s|{{RM_WEIGHT}}|${RM_WEIGHT}|g" \
58-
-e "s|{{CITATION_AUDIT_WEIGHT}}|${CITATION_AUDIT_WEIGHT}|g" \
58+
-e "s|{{PRESENTATION_QUALITY_WEIGHT}}|${PRESENTATION_QUALITY_WEIGHT}|g" \
59+
-e "s|{{GROUNDING_WEIGHT}}|${GROUNDING_WEIGHT}|g" \
5960
-e "s|{{OPENJUDGE_LLM}}|${OPENJUDGE_LLM}|g" \
6061
-e "s|{{RM_LLM}}|${RM_LLM}|g" \
6162
-e "s|{{JUDGE_CONCURRENCY}}|${JUDGE_CONCURRENCY}|g" \
62-
-e "s|{{REPORT_RESOLUTION_WEIGHT}}|${REPORT_RESOLUTION_WEIGHT}|g" \
63-
-e "s|{{TRAJECTORY_FAITHFULNESS_WEIGHT}}|${TRAJECTORY_FAITHFULNESS_WEIGHT}|g" \
6463
-e "s|{{NUM_REPEAT}}|${NUM_REPEAT}|g" \
6564
-e "s|{{NUM_STEPS}}|${NUM_STEPS}|g" \
6665
-e "s|{{TRAIN_BATCH_SIZE}}|${TRAIN_BATCH_SIZE}|g" \
@@ -72,7 +71,7 @@ sed -e "s|{{SUFFIX}}|${SUFFIX}|g" \
7271
${AJET_ROOT}/${CONFIG_TEMPLATE} > ${CONFIG_FILE}
7372

7473
echo "配置文件已生成: ${CONFIG_FILE}"
75-
echo "参数确认: RM=${RM_WEIGHT}, Citation=${CITATION_AUDIT_WEIGHT}, OpenJudge=${OPENJUDGE_LLM}, RM_LLM=${RM_LLM}"
74+
echo "参数确认: RM=${RM_WEIGHT}, PresentationQuality=${PRESENTATION_QUALITY_WEIGHT}, Grounding=${GROUNDING_WEIGHT}, OpenJudge=${OPENJUDGE_LLM}, RM_LLM=${RM_LLM}"
7675

7776
#===============================================================================
7877
# 3. 环境配置
@@ -106,15 +105,15 @@ export DEEPFINANCE_MCP_CONFIG DEEPFINANCE_TOOL_RESULT_MAX_CHARS
106105
# 其他服务配置
107106
HF_ENDPOINT="https://hf-mirror.com"
108107
ES_HOSTS="http://11.160.132.46:8200"
109-
export HF_ENDPOINT ES_HOSTS
108+
export HF_ENDPOINT ES_HOSTS
110109

111110
# log 文件位置
112111
CURRENT_TIME=$(date "+%Y%m%d_%H%M%S")
113112
LOG_DIR="${AJET_ROOT}/logs/${PREFIX}"
114113
MASTER_IP_FILE="${LOG_DIR}/master-ip_${SUFFIX}.log"
115114
ENV_SERVICE_LOG="${LOG_DIR}/env_service_${SUFFIX}_${CURRENT_TIME}.log"
116115
TRAIN_LOG="${LOG_DIR}/train_${SUFFIX}_${CURRENT_TIME}.log"
117-
116+
env_log_prefix="${SUFFIX}__${CURRENT_TIME}"
118117
# 多机训练参数配置
119118
GPUS_PER_NODE=8
120119
EXPECTED_WORKERS=$WORLD_SIZE
@@ -156,6 +155,8 @@ export NCCL_ASYNC_ERROR_HANDLING=1
156155

157156
export PYTHONPATH="${AJET_ROOT}:${PYTHONPATH}"
158157
export RAY_CLUSTER_MODE="multi_node"
158+
export DEEPFINANCE_PATH="${ENV_SERVICE_ROOT}" # AgentJet 内部可能使用此路径
159+
export DEEPFINANCE_SCRIPT="source /mnt/data/taoshuchang.tsc/anaconda3/etc/profile.d/conda.sh && conda activate finworld_1209 && cd ${ENV_SERVICE_ROOT} && DEEPFINANCE_TOOL_RESULT_MAX_CHARS=${DEEPFINANCE_TOOL_RESULT_MAX_CHARS} DEEPFINANCE_MCP_CONFIG=${DEEPFINANCE_MCP_CONFIG} CACHE_TYPE=${CACHE_TYPE} MONGO_URI=${MONGO_URI} MONGO_DB_NAME=${MONGO_DB_NAME} MONGO_COLLECTION_NAME=${MONGO_COLLECTION_NAME} python -m env_service.env_service --env finworld --portal 0.0.0.0 --port 8080"
159160

160161

161162
#===============================================================================
@@ -202,11 +203,12 @@ if [[ $HOSTNAME == *"-master-"* ]]; then
202203

203204
# 启动训练任务(最核心)
204205
python ajet/launcher.py \
206+
--with-deepfinance \
205207
--conf ${CONFIG_FILE} \
206208
--backbone="verl" \
207-
--prefix=${SUFFIX} \
209+
--prefix=${env_log_prefix} \
208210
2>&1 | tee ${TRAIN_LOG}
209-
211+
210212

211213
#===============================================================================
212214
# 6.2 Worker 节点启动流程
@@ -218,4 +220,4 @@ else
218220
ray stop || true
219221
ray start --address $MASTER_ADDR:6379 --num-gpus 8
220222
while true; do sleep 60; done
221-
fi
223+
fi

tutorial/example_deep_finance/deep_finance.yaml

Lines changed: 16 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,26 @@
11
# ------------------ 主要配置 ------------------
22
ajet:
3-
project_name: ajet_deep_finance
4-
experiment_name: "ajet_deep_finance"
3+
project_name: "{{PREFIX}}"
4+
experiment_name: "{{SUFFIX}}"
55
# Judge 配置(嵌套结构,对应 self.config.ajet.judge.*)
66
judge:
7-
openjudge_llm: qwen-flash # OpenJudge 模型
8-
rm_llm: qwen-max # RM Gallery 模型
9-
concurrency: 10 # Judge 并发数
7+
openjudge_llm: {{OPENJUDGE_LLM}} # OpenJudge 模型
8+
rm_llm: {{RM_LLM}} # RM Gallery 模型
9+
concurrency: {{JUDGE_CONCURRENCY}} # Judge 并发数
1010
train_ref_ans_path: {{TRAIN_REF_ANS_PATH}} # 训练集 Reference Answer 路径
1111
val_ref_ans_path: {{VAL_REF_ANS_PATH}} # 验证集 Reference Answer 路径
1212
# OpenJudge 权重配置
13-
report_resolution_weight: 0.2 # 报告质量评估
14-
trajectory_faithfulness_weight: 0.2 # 事实准确性评估
15-
citation_audit_weight: 0.2 # 引用审计评估 (覆盖率 + 真实性)
16-
rm_weight: 0.4 # RM Gallery 权重
13+
presentation_quality_weight: {{PRESENTATION_QUALITY_WEIGHT}} # 报告呈现质量评估
14+
grounding_weight: {{GROUNDING_WEIGHT}} # 引用规范性评估
15+
rm_weight: {{RM_WEIGHT}} # RM Gallery 权重
1716
task_judge:
1817
# 使用本地 DeepFinanceJudge 进行评估(解耦远程 env_service)
1918
judge_protocol: tutorial.example_deep_finance.deep_finance_judge->DeepFinanceJudgeByOpenJudge
2019
model:
2120
# ✨✨✨✨ 设置待训练的模型
2221
path: {{MODEL_PATH}}
2322
trainer_common:
24-
nnodes: 8
23+
nnodes: {{NNODES}}
2524
n_gpus_per_node: 8
2625
val_before_train: True
2726
val_pass_n: 8
@@ -32,44 +31,42 @@ ajet:
3231
rollout:
3332
# ✨✨✨✨ 编写并选择Agent
3433
user_workflow: tutorial.example_deep_finance.deep_finance->ExampleDeepResearchProtocol
35-
force_disable_toolcalls: True
34+
force_disable_toolcalls: False
3635
enable_oversample: False
3736
tensor_model_parallel_size: 8
38-
num_repeat: 4
37+
num_repeat: {{NUM_REPEAT}}
3938
max_env_worker: 64 # 增加环境并行数
4039
max_num_seqs: 64 # 增加VLLM并发序列数
4140
max_response_length_in_one_turn: 8000
4241
max_model_len: 50000
4342
agent_madness_reward: 0.0
4443
compute_madness_checklist: None
4544
multi_turn:
46-
max_steps: 6
45+
max_steps: {{NUM_STEPS}}
4746
interchange_server:
4847
interchange_method: 'tcp' # options: 'tcp' (multi-nodes) or 'ipc' (1 node)
4948
debug:
5049
debug_max_parallel: 1 # 增加并行任务数,充分利用GPU
5150
debug_first_n_tasks: 100 # 增加处理的任务数
5251
data:
53-
train_batch_size: 32
52+
train_batch_size: {{TRAIN_BATCH_SIZE}}
5453
max_prompt_length: 8000
5554
max_response_length: 41000
5655

5756
task_reader:
5857
type: deep_finance # 数据从 JSON 加载并组装 init_messages,工具调用走 env_service
5958
deep_finance:
6059
training:
61-
file_path: {{TRAIN_PATH}}
60+
file_path: {{TRAIN_DATA_PATH}}
6261
validation:
63-
file_path: {{VAL_PATH}}
62+
file_path: {{VAL_DATA_PATH}}
6463
# env_service 仍需配置(用于工具调用)
6564
env_service:
6665
env_type: "finworld"
6766
env_url: {{ENV_SERVICE_URL}}
6867
env_action_preference: code
69-
70-
7168
trainer:
72-
default_local_dir: {{CKPT_SAVE_PATH}}
69+
default_local_dir: "{{CKPT_SAVE_PATH}}/{{PREFIX}}/{{SUFFIX}}"
7370
# resume_mode: disable # 禁用自动恢复,从头开始训练
7471
actor_rollout_ref:
7572
rollout:

0 commit comments

Comments
 (0)