44https://github.com/microsoft/agent-lightning
55源码改动:
66注释掉agentlightning/runner.py 115行
7+ ```
78if trace_spans:
89 triplets = self.triplet_exporter.export(trace_spans)
10+ ```
911agentlightning/verl/daemon.py 338行
12+ ```
1013trace_list = [
1114 {"prompt_ids": t.prompt.get("token_ids", []), "response_ids": t.response.get("token_ids", []), "reward": t.reward}
1215 for t in rollout.triplets
1316 ]
17+ ```
1418agentlightning/verl/daemon.py 418行
1519注释掉
20+ ```
1621reward_list.append(sample_info["reward"])
22+ ```
1723改为
24+ ```
1825reward_list.append(trace["reward"])
19-
26+ ```
2027添加examples/werewolf 实现
2128
2229和agentscope(458e8eedc94bba89bc3e4c6756e35fb4defbc0ac,Sep 15, 2025)实现的一个中文狼人杀agent-rl训练的案例
@@ -27,6 +34,7 @@ https://github.com/af-74413592/agentscope
2734需做如下改动:
2835src/agentscope/model/_ openai_model.py 371行
2936改为
37+ ```
3038if choice.message.content:
3139try:
3240 thinking_part = choice.message.content.split("<think>")[1].split("</think>")[0]
@@ -50,8 +58,9 @@ except:
5058 text=response.choices[0].message.content,
5159 ),
5260 )
53-
61+ ```
5462处理过长的prompt:src/agentscope/model/_ openai_model.py OpenAIChatModel 的__ call__ 函数
63+ ```
5564conversations = [{"role":msg["role"], "content":msg["content"][0]['text'] if type(msg["content"]) == list else msg["content"]} for msg in messages]
5665input_ids = self.tokenizer.apply_chat_template(
5766 conversations,
@@ -67,17 +76,21 @@ while len(input_ids) > 10000: (比maxlen稍微小一点)
6776 add_generation_prompt=True,
6877 tokenize=True,
6978 )
70-
79+ ```
7180verlv0.5.0 改动
7281
7382注释掉 verl trainer/ppo/ray_trainer.py 415-418行
83+ ```
7484real_train_batch_size = config.data.train_batch_size * config.actor_rollout_ref.rollout.n
7585 assert real_train_batch_size % minimal_bsz == 0, (
7686 f"real_train_batch_size ({real_train_batch_size}) must be divisible by minimal possible batch size "
7787 f"({minimal_bsz})"
7888 )
79- 注释掉 verl trainer/ppo/ray_trainer.py 500 行 # assert config.data.train_batch_size >= config.actor_rollout_ref.actor.ppo_mini_batch_size
80-
89+ ```
90+ 注释掉 verl trainer/ppo/ray_trainer.py 500 行
91+ ```
92+ assert config.data.train_batch_size >= config.actor_rollout_ref.actor.ppo_mini_batch_size
93+ ```
8194
8295####################################################################
8396
0 commit comments