|
| 1 | +# AgentJet Timeline |
| 2 | + |
| 3 | +In complex multi-agent LLM interactions, we define a Timeline as the token trajectory generated by repeatedly invoking an LLM during a task execution process. |
| 4 | + |
| 5 | +A Timeline contains the following elements: |
| 6 | +- Text message list |
| 7 | + - Note: In most Qwen models, messages start with `<|im_start|>` and end with `<|im_end|>`, depending on the model's tokenizer and chat_template |
| 8 | +- Token sequence message list |
| 9 | + - Note: In most Qwen models, messages start with the Token ID corresponding to `<|im_start|>` and end with the Token corresponding to `<|im_end|>`, depending on the model's tokenizer |
| 10 | +- Author list |
| 11 | + - Note: Identifies the producer of each message. Generally, there are two types: "llm" and "env". |
| 12 | +- Token LogProb message list |
| 13 | + - Note: Records the log probability of each token generation. If a token's producer is not "llm", the value is `INVALID_LOG_PROB_VALUE` (set to 0 or `np.inf` as needed) |
| 14 | +- Loss Mask message list |
| 15 | + - Note: Each bit of loss_mask corresponds one-to-one with tokens |
| 16 | + - loss_mask=1 means the token participates in loss calculation, and usually also indicates that the token is generated by the LLM |
| 17 | + - loss_mask=0 means it does not participate in loss calculation, and in most cases represents that the token comes from user input, tokenizer and chat_template additions, environment feedback, etc. |
| 18 | + |
| 19 | + |
| 20 | +## Intertwining Timelines in Multi-Turn Conversations and Multi-Agent Scenarios |
| 21 | + |
| 22 | +In multi-turn conversations and multi-agent scenarios, extracting clean and tidy timelines is not easy: |
| 23 | + |
| 24 | +- For ease of use and compatibility with most Agentic frameworks and parallel LLM calls, AgentJet only perceives standard OpenAI-format LLM requests sent by users (or Agent frameworks), without requiring users to provide causal relationships between LLM requests. |
| 25 | + |
| 26 | +- Some Agentic frameworks (such as Langchain, etc.) automatically execute retry operations to improve task success rates. For example, when LLM tool call parameters are invalid, the Agent framework appends error information as temporary context to the request. After achieving the expected result, these temporary contexts are removed. Without proper handling, samples generated during this process can significantly reduce RL algorithm efficiency. |
| 27 | + |
| 28 | +- Application of dynamic memory mechanisms: You can use projects like [ReMe](https://github.com/agentscope-ai/ReMe) to provide agents with short-term and long-term memory capabilities, significantly improving agent performance in personal assistant tasks. When an agent decides to update knowledge existing in the historical context, it creates a fork point in the timeline. |
| 29 | + |
| 30 | +- When multiple agents exist in the environment and the current task is in a partially observable environment (for example, individual agent contexts store secrets that cannot be mutually observed; or each agent actively shields some information through context offload techniques to better focus on the current task), multiple timelines naturally arise, with each timeline belonging to one agent. |
| 31 | + |
| 32 | +- When a token sequence is decoded into text and then re-encoded back into a token sequence by the tokenizer, it sometimes cannot be precisely converted back to the original token sequence (this drift occurs with varying probabilities in different models). This token drift requires fine-grained processing to (1) improve training efficiency and (2) stabilize training. |
| 33 | + |
| 34 | + |
| 35 | +In the AgentJet system, we adopt the approach of "timeline merging": |
| 36 | + |
| 37 | +**AgentJet autonomously identifies differences between different timelines at the end of an episode, and according to user presets, finds timelines that can be merged and automatically completes the merge. This reduces the number of overlapping redundant samples and improves training efficiency.** |
| 38 | + |
| 39 | +## Timeline Merging Algorithm |
| 40 | + |
| 41 | +When an episode starts, AgentJet initializes a context tracker object that captures all LLM requests. Each LLM request starts from `<|im_start|>` and ends at `<|im_end|>` or when the token count overflows. Each LLM request is considered an independent initial timeline before merging. In an episode, n initial timelines from m agents can be collected: |
| 42 | + |
| 43 | +$\text{Timelines} = \lbrace |
| 44 | +T_1\left(M_\text{1}, m_\text{1}, a_\text{1}\right), |
| 45 | +T_2\left(M_\text{2}, m_\text{2}, a_\text{2}\right), |
| 46 | +\dots, |
| 47 | +T_n\left(M_\text{n}, m_\text{n}, a_\text{n}\right) |
| 48 | +\rbrace$ |
| 49 | + |
| 50 | +Where: |
| 51 | +- $T_i$ represents the $i$-th (unmerged) timeline. $T_i = [T_{i}^{[1]}, T_{i}^{[2]}, \dots, T_{i}^{[|T_{i}|]}]$. |
| 52 | + - The last item $T_{i}^{[|T_{i}|]} = m_\text{i}$: always the output of this LLM request. |
| 53 | + - The first $|T_{i}|-1$ items: always the input $M_\text{i}$ of this LLM request. |
| 54 | +- $a_\text{i} \in \lbrace A_1, \dots, A_m \rbrace$ represents the agent's name ID. It's worth mentioning that when the user's workflow doesn't provide an agent name, $\lbrace T_1, T_2, \dots, T_n\rbrace$ is considered to be from the same agent (default agent). |
| 55 | +- $M_\text{i}$ and $m_\text{i}$ represent the input message list and output message respectively. Each message has a Text, Token, Loss Mask triplet. |
| 56 | + |
| 57 | +At the end of an episode, all timelines are compared. If two timelines satisfy the following conditions: |
| 58 | + |
| 59 | +- Condition 1: $|T_{i}| <= |T_{j}|$ |
| 60 | +- Condition 2: The token sequences of the first $|T_{i}|$ messages of $T_{i}$ and $T_{j}$ are identical. That is, $\text{Token}(T_{i}^{[k]}) = \text{Token}(T_{j}^{[k]}), \forall k \in \left[1, |T_{i}| \right]$. |
| 61 | + |
| 62 | +Then the two timelines are merged: |
| 63 | + |
| 64 | +- $T_{i}$ is the **absorbed** short timeline; $T_{j}$ is the long timeline to be updated. |
| 65 | +- If there is a set of identical messages satisfying $\text{Author}(T_{i}^{[k]}) = \text{llm}$ and $\text{Author}(T_{j}^{[k]}) \neq \text{llm}$, then: |
| 66 | + - $\text{Author}(T_{j}^{[k]}) = \text{llm}$ |
| 67 | + - $\text{Token}(T_{j}^{[k]}) = \text{Token}(T_{i}^{[k]})$ |
| 68 | + - $\text{TokenLogProb}(T_{j}^{[k]}) = \text{TokenLogProb}(T_{i}^{[k]})$ |
| 69 | + |
| 70 | + ```python |
| 71 | + def toggle_author_and_mask( |
| 72 | + source_timeline: List[ExtendedMessage], # the longer timeline |
| 73 | + target_timeline: List[ExtendedMessage], # the shorter timeline |
| 74 | + ) -> List[ExtendedMessage]: |
| 75 | + for k in range(len(target_timeline)): |
| 76 | + if target_timeline[k].author == "llm" and source_timeline[k].author != "llm": |
| 77 | + source_timeline[k].author = target_timeline[k].author |
| 78 | + source_timeline[k].token_arr = target_timeline[k].token_arr |
| 79 | + source_timeline[k].token_logprob_arr = target_timeline[k].token_logprob_arr |
| 80 | + assert source_timeline[k].need_training |
| 81 | + return source_timeline # merged timeline |
| 82 | + ``` |
| 83 | + |
| 84 | +Note: Loss Mask is calculated in detail during post-processing based on the $\text{Author}(\cdot)$ list, so there's no need to focus on it when merging timelines. |
| 85 | + |
| 86 | +## More Relaxed Merging Conditions for Faster Training |
| 87 | + |
| 88 | +### Relaxed Token Matching |
| 89 | + |
| 90 | +In practice, we found that when a token sequence is decoded into text and then re-encoded back into a token sequence by the tokenizer, it sometimes cannot be precisely converted back to the original token sequence. |
| 91 | + |
| 92 | +Therefore, the following situation often occurs in reality: |
| 93 | +- $\text{Author}(T_{i}^{[k]}) = \text{llm}$ |
| 94 | +- $\text{Author}(T_{j}^{[k]}) \neq \text{llm}$ |
| 95 | +- $\text{Text}(T_{j}^{[k]}) = \text{Text}(T_{i}^{[k]})$ |
| 96 | +- $\text{Token}(T_{j}^{[k]}) \neq \text{Token}(T_{i}^{[k]})$ |
| 97 | + |
| 98 | +This means the text sequences are completely equal, but during vLLM's internal tokenizer conversion, two variants of token sequences were produced. |
| 99 | +In this case, you can control AgentJet's behavior by adjusting: |
| 100 | +```yaml |
| 101 | +ajet.context_tracker.timeline_merging_policy.timeline_compare_level = "text" / "token" # (default text) |
| 102 | +``` |
| 103 | + |
| 104 | + |
| 105 | +| Merge Strategy | Merge Condition | Use Case | Advantages | Disadvantages | |
| 106 | +|---------------|----------------|----------|------------|---------------| |
| 107 | +| **token** | Requires $\text{Token}(T_{i}^{[k]}) = \text{Token}(T_{j}^{[k]})$ | Token sequences must be completely identical to merge | Strict matching, high training data precision | Due to tokenizer encoding/decoding drift, may fail to merge timelines that should be merged, reducing training efficiency | |
| 108 | +| **text** | Only requires $\text{Text}(T_{i}^{[k]}) = \text{Text}(T_{j}^{[k]})$ | Can merge when text content is the same, tolerates token sequence differences | More relaxed merging conditions, improves merge rate and training efficiency, reduces redundant samples | May merge samples with slightly different token representations, but impact is minimal in practice | |
| 109 | + |
| 110 | +**Recommended Configuration:** |
| 111 | +- Use the `"text"` strategy by default, which effectively handles token drift during tokenizer encoding/decoding. |
| 112 | +- Use the `"token"` strategy when strict training-inference consistency is required. |
| 113 | + |
| 114 | +### Relaxed Tool Matching |
| 115 | + |
| 116 | +Most model tokenizer chat templates place the list of tools to be used at the very beginning (system prompt). |
| 117 | +When the agent's tool list is fine-tuned but other context remains unchanged, you can control AgentJet's behavior by adjusting: |
| 118 | +```yaml |
| 119 | +ajet.context_tracker.timeline_merging_policy.ignore_tools = True / False # (default True) |
| 120 | +``` |
| 121 | + |
| 122 | +| Merge Strategy | Merge Condition | Use Case | Advantages | Disadvantages | |
| 123 | +|---------------|----------------|----------|------------|---------------| |
| 124 | +| **True** | Ignores tool list differences, can merge as long as other context is the same | Agent tool list changes dynamically, but core dialogue logic remains unchanged | Significantly improves merge rate, reduces redundant samples caused by tool list changes, enhances training efficiency | May merge samples with slightly different tool environments, but impact is limited in most scenarios | |
| 125 | +| **False** | Strictly compares tool lists, tool lists must be completely identical to merge | Tool invocation is critical for training, requires precise tool configuration matching | Ensures timeline tool environments are completely consistent, training data strictly aligned | When tool list has minor changes, cannot merge timelines with the same context, reducing training efficiency | |
| 126 | + |
| 127 | +**Recommended Configuration:** |
| 128 | +- Use the `True` strategy to effectively reduce redundant samples. |
| 129 | +- Use the `False` strategy when strict training-inference consistency is required. Also recommended when agent tools change significantly and infrequently (such as dynamic tool loading, tool version updates, etc.). |
| 130 | + |
| 131 | +## Other Timeline Management Options |
| 132 | + |
| 133 | +### Automatic Re-tokenization Drift Fixing |
| 134 | + |
| 135 | +By default, AgentJet automatically performs re-tokenization drift fixing based on Token IDs returned by the vLLM engine. This consumes a little extra CPU power. |
| 136 | + |
| 137 | +```yaml |
| 138 | +ajet.context_tracker.fix_retokenization_drift = True # (default True) |
| 139 | +``` |
| 140 | + |
| 141 | +For details on the re-tokenization drift phenomenon, you can follow https://github.com/vllm-project/vllm/pull/22587 for more information. |
| 142 | + |
| 143 | +### Detecting Timeline Divergence Points |
| 144 | + |
| 145 | +In single-agent multi-turn conversation scenarios, if you are very concerned about training efficiency and want to diagnose in detail at what moment and for what reason your Agentic framework caused timeline forks, you can enable: |
| 146 | + |
| 147 | +```yaml |
| 148 | +ajet.context_tracker.detect_timeline_snap = False # (default False) |
| 149 | +``` |
| 150 | + |
| 151 | +Enable real-time detection of timeline divergence points. This consumes CPU power and slows down the training process. Only recommended for use in debug mode (`--backbone=debug`). |
0 commit comments