You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Conversation memory determines how much of the conversation history is included in the context window when your Agent processes a new message. The Inkeep Agent Framework automatically manages conversation history to balance context retention with token efficiency, with specialized handling for delegated agents and tool results.
@@ -18,31 +20,97 @@ The conversation history now includes:
18
20
-**Tool results**: Results from tool executions, providing context about what actions were performed
19
21
-**Agent communications**: Messages exchanged between agents during transfers and delegations
20
22
21
-
## Default Limits
23
+
## Memory Management
24
+
25
+
The system uses two approaches for managing conversation history:
22
26
23
-
By default, the system includes conversation history using these limits:
27
+
### Intelligent Compression (Primary Method)
28
+
When agents have a summarizer model configured (standard setup):
29
+
-**Up to 10,000 messages**: Retrieves extensive conversation history to find compression summaries and make intelligent decisions
30
+
-**No token limits**: Model-aware compression manages context based on each model's actual capabilities
31
+
-**Dynamic optimization**: Automatically compresses when approaching model-specific thresholds (50% for conversation-level, 75-91% for sub-agent operations)
24
32
33
+
### Fixed Limits (Fallback Method)
34
+
For agents without a summarizer model:
25
35
-**50 messages**: Up to the 50 most recent messages from the conversation
26
36
-**8,000 tokens**: Maximum of 8,000 tokens from previous conversation messages
27
37
28
38
<Note>
29
-
The 50-message and 8,000-token limits are the default values. The token limit can be adjusted via the `AGENTS_CONVERSATION_HISTORY_MAX_OUTPUT_TOKENS_DEFAULT` environment variable if needed.
39
+
Most agents use intelligent compression, which provides superior context management tailored to each model's capabilities. The fixed limits serve as safety nets when a summarizer model is not available.
40
+
</Note>
41
+
42
+
## Intelligent Compression System
43
+
44
+
The framework's intelligent compression system is the primary method for managing conversation memory. It automatically analyzes model capabilities and compresses context when needed to optimize performance.
45
+
46
+
### How Compression Works
47
+
48
+
The compression system operates continuously, making intelligent decisions about context management:
49
+
50
+
<Steps>
51
+
<Step>
52
+
**Context Monitoring**: System continuously monitors conversation size against model limits
53
+
</Step>
54
+
<Step>
55
+
**Automatic Triggering**: Compression triggers at 50% of context window for conversation-level, or at model-aware thresholds (~75-91% depending on model size) for sub-agent generation
56
+
</Step>
57
+
<Step>
58
+
**Tool Result Archiving**: Large tool results are stored as artifacts and replaced with summary references
59
+
</Step>
60
+
<Step>
61
+
**AI Summarization**: Older conversation parts are summarized by AI while preserving key context
62
+
</Step>
63
+
<Step>
64
+
**Fallback Protection**: If compression is unavailable, system falls back to fixed message and token limits
65
+
</Step>
66
+
</Steps>
67
+
68
+
### Model-Specific Behavior
69
+
70
+
Different models have different context windows, and compression adapts accordingly:
71
+
72
+
<CompressionModelsTable />
73
+
74
+
### Compression Types
75
+
76
+
#### Conversation-Level Compression
77
+
-**Trigger**: When conversation reaches 50% of model's context window
78
+
-**Action**: Compresses entire conversation history into summary + artifacts
79
+
-**Use Case**: Long conversations with extensive history
80
+
81
+
**Example**: You have a 20-message conversation about planning a software project. The conversation includes requirements gathering, architecture discussions, and code reviews. When it hits the 50% threshold, the system creates a summary like "User discussed project requirements for e-commerce platform, decided on microservices architecture, reviewed authentication flow..." and stores detailed tool outputs as artifacts.
82
+
83
+
#### Sub-Agent Generation Compression
84
+
-**Trigger**: During sub-agent execution when tool results exceed model-aware limits (75-91% depending on model size)
85
+
-**Action**: Compresses generated tool results while preserving original context
86
+
-**Use Case**: Sub-agents performing many tool operations during generation
87
+
88
+
**Example**: A sub-agent is tasked with "analyze this codebase for security issues." During execution, it uses tools to:
89
+
1. Read 15 different files (large outputs)
90
+
2. Run security scans (detailed reports)
91
+
3. Check dependencies (long lists)
92
+
4. Analyze configurations (verbose JSON)
93
+
94
+
When these tool results fill up the context window, the system compresses them into: "Analyzed 15 files, found 3 SQL injection risks in auth.py, 2 XSS vulnerabilities in templates..." while keeping the original conversation and task intact.
95
+
96
+
<Note>
97
+
Compression happens automatically and transparently. Your agents will continue to work normally even with compressed conversations, as the system preserves all essential context and provides artifact references for detailed information.
30
98
</Note>
31
99
32
100
## How It Works
33
101
34
102
<Steps>
35
103
<Step>
36
-
**Message Retrieval**: The system retrieves up to 50 most recent messages from the conversation history
104
+
**Message Retrieval**: The system retrieves conversation history (up to 10,000 messages with intelligent compression, or 50 messages with fixed limits)
37
105
</Step>
38
106
<Step>
39
107
**Delegation Filtering**: Messages are filtered based on delegation context - delegated agents see their own tool results plus top-level conversation context
40
108
</Step>
41
109
<Step>
42
-
**Token Calculation**: Remaining messages are processed, calculating token count for each message
110
+
**Context Management**: With intelligent compression, the system analyzes model capabilities and compresses when needed. With fixed limits, messages are truncated at token thresholds.
43
111
</Step>
44
112
<Step>
45
-
**Exclusion**: If the total token count exceeds 4,000 tokens, older messages are excluded from the context window
113
+
**Optimization**: Intelligent compression creates summaries and artifacts to preserve essential context while staying within model limits
0 commit comments