Skip to content

Commit 6f0c420

Browse files
committed
Merge remote-tracking branch 'origin/main' into dev/shuchang_newjudge
2 parents 4538f5a + c984b91 commit 6f0c420

File tree

30 files changed

+902
-152
lines changed

30 files changed

+902
-152
lines changed

.gitignore

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -159,4 +159,8 @@ tutorial/example_deep_finance/config/*
159159
tutorial/example_deep_finance/scripts/*
160160
flash_attn-2.8.*.whl
161161
tutorial/example_deep_finance/prepare_data/*
162-
tutorial/example_deep_finance/judge/analytical_sufficiency/*
162+
tutorial/example_deep_finance/judge/analytical_sufficiency/*
163+
164+
.dockerignore
165+
benchmark_datasets
166+
modelscope_cache

README.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -146,10 +146,23 @@ If you use AgentJet in your research, please cite:
146146
}
147147
```
148148

149+
150+
149151
<br/>
150152

151153
---
152154
<div align="center">
153155

154156
[⭐ Star Us](https://github.com/modelscope/AgentJet) · [Report Bug](https://github.com/modelscope/AgentJet/issues) · [Request Feature](https://github.com/modelscope/AgentJet/issues)
155157
</div>
158+
159+
160+
161+
<div align="center">
162+
<img width="180" alt="image" src="https://img.alicdn.com/imgextra/i4/O1CN01DJuOtZ1Kgu1UvjaNl_!!6000000001194-2-tps-922-882.png"/>
163+
<br/>
164+
<span>Join AgentJet DingTalk Group to share your idea</span>
165+
</div>
166+
167+
168+

ajet/backbone/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,4 +13,4 @@
1313
"AjetTaskReader",
1414
]
1515
except ImportError:
16-
logger.warning("trinity is not available.")
16+
logger.info("trinity is not available.")

ajet/context_tracker/multiagent_tracking.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -95,6 +95,7 @@ def extract_text_content_from_content_dict(self, msg):
9595
# ],
9696
# }
9797

98+
9899
str_content = ""
99100
for item in msg["content"]:
100101
# item = {
Lines changed: 151 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,151 @@
1+
# AgentJet Timeline
2+
3+
In complex multi-agent LLM interactions, we define a Timeline as the token trajectory generated by repeatedly invoking an LLM during a task execution process.
4+
5+
A Timeline contains the following elements:
6+
- Text message list
7+
- Note: In most Qwen models, messages start with `<|im_start|>` and end with `<|im_end|>`, depending on the model's tokenizer and chat_template
8+
- Token sequence message list
9+
- Note: In most Qwen models, messages start with the Token ID corresponding to `<|im_start|>` and end with the Token corresponding to `<|im_end|>`, depending on the model's tokenizer
10+
- Author list
11+
- Note: Identifies the producer of each message. Generally, there are two types: "llm" and "env".
12+
- Token LogProb message list
13+
- Note: Records the log probability of each token generation. If a token's producer is not "llm", the value is `INVALID_LOG_PROB_VALUE` (set to 0 or `np.inf` as needed)
14+
- Loss Mask message list
15+
- Note: Each bit of loss_mask corresponds one-to-one with tokens
16+
- loss_mask=1 means the token participates in loss calculation, and usually also indicates that the token is generated by the LLM
17+
- loss_mask=0 means it does not participate in loss calculation, and in most cases represents that the token comes from user input, tokenizer and chat_template additions, environment feedback, etc.
18+
19+
20+
## Intertwining Timelines in Multi-Turn Conversations and Multi-Agent Scenarios
21+
22+
In multi-turn conversations and multi-agent scenarios, extracting clean and tidy timelines is not easy:
23+
24+
- For ease of use and compatibility with most Agentic frameworks and parallel LLM calls, AgentJet only perceives standard OpenAI-format LLM requests sent by users (or Agent frameworks), without requiring users to provide causal relationships between LLM requests.
25+
26+
- Some Agentic frameworks (such as Langchain, etc.) automatically execute retry operations to improve task success rates. For example, when LLM tool call parameters are invalid, the Agent framework appends error information as temporary context to the request. After achieving the expected result, these temporary contexts are removed. Without proper handling, samples generated during this process can significantly reduce RL algorithm efficiency.
27+
28+
- Application of dynamic memory mechanisms: You can use projects like [ReMe](https://github.com/agentscope-ai/ReMe) to provide agents with short-term and long-term memory capabilities, significantly improving agent performance in personal assistant tasks. When an agent decides to update knowledge existing in the historical context, it creates a fork point in the timeline.
29+
30+
- When multiple agents exist in the environment and the current task is in a partially observable environment (for example, individual agent contexts store secrets that cannot be mutually observed; or each agent actively shields some information through context offload techniques to better focus on the current task), multiple timelines naturally arise, with each timeline belonging to one agent.
31+
32+
- When a token sequence is decoded into text and then re-encoded back into a token sequence by the tokenizer, it sometimes cannot be precisely converted back to the original token sequence (this drift occurs with varying probabilities in different models). This token drift requires fine-grained processing to (1) improve training efficiency and (2) stabilize training.
33+
34+
35+
In the AgentJet system, we adopt the approach of "timeline merging":
36+
37+
**AgentJet autonomously identifies differences between different timelines at the end of an episode, and according to user presets, finds timelines that can be merged and automatically completes the merge. This reduces the number of overlapping redundant samples and improves training efficiency.**
38+
39+
## Timeline Merging Algorithm
40+
41+
When an episode starts, AgentJet initializes a context tracker object that captures all LLM requests. Each LLM request starts from `<|im_start|>` and ends at `<|im_end|>` or when the token count overflows. Each LLM request is considered an independent initial timeline before merging. In an episode, n initial timelines from m agents can be collected:
42+
43+
$\text{Timelines} = \lbrace
44+
T_1\left(M_\text{1}, m_\text{1}, a_\text{1}\right),
45+
T_2\left(M_\text{2}, m_\text{2}, a_\text{2}\right),
46+
\dots,
47+
T_n\left(M_\text{n}, m_\text{n}, a_\text{n}\right)
48+
\rbrace$
49+
50+
Where:
51+
- $T_i$ represents the $i$-th (unmerged) timeline. $T_i = [T_{i}^{[1]}, T_{i}^{[2]}, \dots, T_{i}^{[|T_{i}|]}]$.
52+
- The last item $T_{i}^{[|T_{i}|]} = m_\text{i}$: always the output of this LLM request.
53+
- The first $|T_{i}|-1$ items: always the input $M_\text{i}$ of this LLM request.
54+
- $a_\text{i} \in \lbrace A_1, \dots, A_m \rbrace$ represents the agent's name ID. It's worth mentioning that when the user's workflow doesn't provide an agent name, $\lbrace T_1, T_2, \dots, T_n\rbrace$ is considered to be from the same agent (default agent).
55+
- $M_\text{i}$ and $m_\text{i}$ represent the input message list and output message respectively. Each message has a Text, Token, Loss Mask triplet.
56+
57+
At the end of an episode, all timelines are compared. If two timelines satisfy the following conditions:
58+
59+
- Condition 1: $|T_{i}| <= |T_{j}|$
60+
- Condition 2: The token sequences of the first $|T_{i}|$ messages of $T_{i}$ and $T_{j}$ are identical. That is, $\text{Token}(T_{i}^{[k]}) = \text{Token}(T_{j}^{[k]}), \forall k \in \left[1, |T_{i}| \right]$.
61+
62+
Then the two timelines are merged:
63+
64+
- $T_{i}$ is the **absorbed** short timeline; $T_{j}$ is the long timeline to be updated.
65+
- If there is a set of identical messages satisfying $\text{Author}(T_{i}^{[k]}) = \text{llm}$ and $\text{Author}(T_{j}^{[k]}) \neq \text{llm}$, then:
66+
- $\text{Author}(T_{j}^{[k]}) = \text{llm}$
67+
- $\text{Token}(T_{j}^{[k]}) = \text{Token}(T_{i}^{[k]})$
68+
- $\text{TokenLogProb}(T_{j}^{[k]}) = \text{TokenLogProb}(T_{i}^{[k]})$
69+
70+
```python
71+
def toggle_author_and_mask(
72+
source_timeline: List[ExtendedMessage], # the longer timeline
73+
target_timeline: List[ExtendedMessage], # the shorter timeline
74+
) -> List[ExtendedMessage]:
75+
for k in range(len(target_timeline)):
76+
if target_timeline[k].author == "llm" and source_timeline[k].author != "llm":
77+
source_timeline[k].author = target_timeline[k].author
78+
source_timeline[k].token_arr = target_timeline[k].token_arr
79+
source_timeline[k].token_logprob_arr = target_timeline[k].token_logprob_arr
80+
assert source_timeline[k].need_training
81+
return source_timeline # merged timeline
82+
```
83+
84+
Note: Loss Mask is calculated in detail during post-processing based on the $\text{Author}(\cdot)$ list, so there's no need to focus on it when merging timelines.
85+
86+
## More Relaxed Merging Conditions for Faster Training
87+
88+
### Relaxed Token Matching
89+
90+
In practice, we found that when a token sequence is decoded into text and then re-encoded back into a token sequence by the tokenizer, it sometimes cannot be precisely converted back to the original token sequence.
91+
92+
Therefore, the following situation often occurs in reality:
93+
- $\text{Author}(T_{i}^{[k]}) = \text{llm}$
94+
- $\text{Author}(T_{j}^{[k]}) \neq \text{llm}$
95+
- $\text{Text}(T_{j}^{[k]}) = \text{Text}(T_{i}^{[k]})$
96+
- $\text{Token}(T_{j}^{[k]}) \neq \text{Token}(T_{i}^{[k]})$
97+
98+
This means the text sequences are completely equal, but during vLLM's internal tokenizer conversion, two variants of token sequences were produced.
99+
In this case, you can control AgentJet's behavior by adjusting:
100+
```yaml
101+
ajet.context_tracker.timeline_merging_policy.timeline_compare_level = "text" / "token" # (default text)
102+
```
103+
104+
105+
| Merge Strategy | Merge Condition | Use Case | Advantages | Disadvantages |
106+
|---------------|----------------|----------|------------|---------------|
107+
| **token** | Requires $\text{Token}(T_{i}^{[k]}) = \text{Token}(T_{j}^{[k]})$ | Token sequences must be completely identical to merge | Strict matching, high training data precision | Due to tokenizer encoding/decoding drift, may fail to merge timelines that should be merged, reducing training efficiency |
108+
| **text** | Only requires $\text{Text}(T_{i}^{[k]}) = \text{Text}(T_{j}^{[k]})$ | Can merge when text content is the same, tolerates token sequence differences | More relaxed merging conditions, improves merge rate and training efficiency, reduces redundant samples | May merge samples with slightly different token representations, but impact is minimal in practice |
109+
110+
**Recommended Configuration:**
111+
- Use the `"text"` strategy by default, which effectively handles token drift during tokenizer encoding/decoding.
112+
- Use the `"token"` strategy when strict training-inference consistency is required.
113+
114+
### Relaxed Tool Matching
115+
116+
Most model tokenizer chat templates place the list of tools to be used at the very beginning (system prompt).
117+
When the agent's tool list is fine-tuned but other context remains unchanged, you can control AgentJet's behavior by adjusting:
118+
```yaml
119+
ajet.context_tracker.timeline_merging_policy.ignore_tools = True / False # (default True)
120+
```
121+
122+
| Merge Strategy | Merge Condition | Use Case | Advantages | Disadvantages |
123+
|---------------|----------------|----------|------------|---------------|
124+
| **True** | Ignores tool list differences, can merge as long as other context is the same | Agent tool list changes dynamically, but core dialogue logic remains unchanged | Significantly improves merge rate, reduces redundant samples caused by tool list changes, enhances training efficiency | May merge samples with slightly different tool environments, but impact is limited in most scenarios |
125+
| **False** | Strictly compares tool lists, tool lists must be completely identical to merge | Tool invocation is critical for training, requires precise tool configuration matching | Ensures timeline tool environments are completely consistent, training data strictly aligned | When tool list has minor changes, cannot merge timelines with the same context, reducing training efficiency |
126+
127+
**Recommended Configuration:**
128+
- Use the `True` strategy to effectively reduce redundant samples.
129+
- Use the `False` strategy when strict training-inference consistency is required. Also recommended when agent tools change significantly and infrequently (such as dynamic tool loading, tool version updates, etc.).
130+
131+
## Other Timeline Management Options
132+
133+
### Automatic Re-tokenization Drift Fixing
134+
135+
By default, AgentJet automatically performs re-tokenization drift fixing based on Token IDs returned by the vLLM engine. This consumes a little extra CPU power.
136+
137+
```yaml
138+
ajet.context_tracker.fix_retokenization_drift = True # (default True)
139+
```
140+
141+
For details on the re-tokenization drift phenomenon, you can follow https://github.com/vllm-project/vllm/pull/22587 for more information.
142+
143+
### Detecting Timeline Divergence Points
144+
145+
In single-agent multi-turn conversation scenarios, if you are very concerned about training efficiency and want to diagnose in detail at what moment and for what reason your Agentic framework caused timeline forks, you can enable:
146+
147+
```yaml
148+
ajet.context_tracker.detect_timeline_snap = False # (default False)
149+
```
150+
151+
Enable real-time detection of timeline divergence points. This consumes CPU power and slows down the training process. Only recommended for use in debug mode (`--backbone=debug`).

0 commit comments

Comments
 (0)