Skip to content

Commit 153d4e5

Browse files
authored
Added Conversation Compression System (#1343)
* added compression and merged * removed debug logging * added changeset * formatted * formatted * updated * formatted * updated test * updated * fixed tests * updated tests * removed distillation logging * removed distillation logging * updated to use full cleanup * updated task handler * updated task handler * updated test * updated snapshot * updated compressors * updated docs * added compressions table * updated * updated * updated table * updated test * biome hceck * updated * updated docs * formatted badly
1 parent ff51b72 commit 153d4e5

40 files changed

+4323
-1643
lines changed

.changeset/true-insects-kick.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
---
2+
"@inkeep/agents-run-api": patch
3+
"@inkeep/agents-cli": patch
4+
"@inkeep/agents-manage-api": patch
5+
"@inkeep/agents-manage-ui": patch
6+
"@inkeep/agents-core": patch
7+
"@inkeep/agents-manage-mcp": patch
8+
"@inkeep/agents-sdk": patch
9+
"@inkeep/ai-sdk-provider": patch
10+
"@inkeep/create-agents": patch
11+
---
12+
13+
Added Conversation COmpression

agents-docs/content/typescript-sdk/memory.mdx

Lines changed: 74 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,8 @@ icon: LuBrain
66
keywords: memory, context window, conversation history, message limits, token limits, delegation, tool results
77
---
88

9+
import { CompressionModelsTable } from '../../src/components/compression-models-table';
10+
911
## Overview
1012

1113
Conversation memory determines how much of the conversation history is included in the context window when your Agent processes a new message. The Inkeep Agent Framework automatically manages conversation history to balance context retention with token efficiency, with specialized handling for delegated agents and tool results.
@@ -18,31 +20,97 @@ The conversation history now includes:
1820
- **Tool results**: Results from tool executions, providing context about what actions were performed
1921
- **Agent communications**: Messages exchanged between agents during transfers and delegations
2022

21-
## Default Limits
23+
## Memory Management
24+
25+
The system uses two approaches for managing conversation history:
2226

23-
By default, the system includes conversation history using these limits:
27+
### Intelligent Compression (Primary Method)
28+
When agents have a summarizer model configured (standard setup):
29+
- **Up to 10,000 messages**: Retrieves extensive conversation history to find compression summaries and make intelligent decisions
30+
- **No token limits**: Model-aware compression manages context based on each model's actual capabilities
31+
- **Dynamic optimization**: Automatically compresses when approaching model-specific thresholds (50% for conversation-level, 75-91% for sub-agent operations)
2432

33+
### Fixed Limits (Fallback Method)
34+
For agents without a summarizer model:
2535
- **50 messages**: Up to the 50 most recent messages from the conversation
2636
- **8,000 tokens**: Maximum of 8,000 tokens from previous conversation messages
2737

2838
<Note>
29-
The 50-message and 8,000-token limits are the default values. The token limit can be adjusted via the `AGENTS_CONVERSATION_HISTORY_MAX_OUTPUT_TOKENS_DEFAULT` environment variable if needed.
39+
Most agents use intelligent compression, which provides superior context management tailored to each model's capabilities. The fixed limits serve as safety nets when a summarizer model is not available.
40+
</Note>
41+
42+
## Intelligent Compression System
43+
44+
The framework's intelligent compression system is the primary method for managing conversation memory. It automatically analyzes model capabilities and compresses context when needed to optimize performance.
45+
46+
### How Compression Works
47+
48+
The compression system operates continuously, making intelligent decisions about context management:
49+
50+
<Steps>
51+
<Step>
52+
**Context Monitoring**: System continuously monitors conversation size against model limits
53+
</Step>
54+
<Step>
55+
**Automatic Triggering**: Compression triggers at 50% of context window for conversation-level, or at model-aware thresholds (~75-91% depending on model size) for sub-agent generation
56+
</Step>
57+
<Step>
58+
**Tool Result Archiving**: Large tool results are stored as artifacts and replaced with summary references
59+
</Step>
60+
<Step>
61+
**AI Summarization**: Older conversation parts are summarized by AI while preserving key context
62+
</Step>
63+
<Step>
64+
**Fallback Protection**: If compression is unavailable, system falls back to fixed message and token limits
65+
</Step>
66+
</Steps>
67+
68+
### Model-Specific Behavior
69+
70+
Different models have different context windows, and compression adapts accordingly:
71+
72+
<CompressionModelsTable />
73+
74+
### Compression Types
75+
76+
#### Conversation-Level Compression
77+
- **Trigger**: When conversation reaches 50% of model's context window
78+
- **Action**: Compresses entire conversation history into summary + artifacts
79+
- **Use Case**: Long conversations with extensive history
80+
81+
**Example**: You have a 20-message conversation about planning a software project. The conversation includes requirements gathering, architecture discussions, and code reviews. When it hits the 50% threshold, the system creates a summary like "User discussed project requirements for e-commerce platform, decided on microservices architecture, reviewed authentication flow..." and stores detailed tool outputs as artifacts.
82+
83+
#### Sub-Agent Generation Compression
84+
- **Trigger**: During sub-agent execution when tool results exceed model-aware limits (75-91% depending on model size)
85+
- **Action**: Compresses generated tool results while preserving original context
86+
- **Use Case**: Sub-agents performing many tool operations during generation
87+
88+
**Example**: A sub-agent is tasked with "analyze this codebase for security issues." During execution, it uses tools to:
89+
1. Read 15 different files (large outputs)
90+
2. Run security scans (detailed reports)
91+
3. Check dependencies (long lists)
92+
4. Analyze configurations (verbose JSON)
93+
94+
When these tool results fill up the context window, the system compresses them into: "Analyzed 15 files, found 3 SQL injection risks in auth.py, 2 XSS vulnerabilities in templates..." while keeping the original conversation and task intact.
95+
96+
<Note>
97+
Compression happens automatically and transparently. Your agents will continue to work normally even with compressed conversations, as the system preserves all essential context and provides artifact references for detailed information.
3098
</Note>
3199

32100
## How It Works
33101

34102
<Steps>
35103
<Step>
36-
**Message Retrieval**: The system retrieves up to 50 most recent messages from the conversation history
104+
**Message Retrieval**: The system retrieves conversation history (up to 10,000 messages with intelligent compression, or 50 messages with fixed limits)
37105
</Step>
38106
<Step>
39107
**Delegation Filtering**: Messages are filtered based on delegation context - delegated agents see their own tool results plus top-level conversation context
40108
</Step>
41109
<Step>
42-
**Token Calculation**: Remaining messages are processed, calculating token count for each message
110+
**Context Management**: With intelligent compression, the system analyzes model capabilities and compresses when needed. With fixed limits, messages are truncated at token thresholds.
43111
</Step>
44112
<Step>
45-
**Exclusion**: If the total token count exceeds 4,000 tokens, older messages are excluded from the context window
113+
**Optimization**: Intelligent compression creates summaries and artifacts to preserve essential context while staying within model limits
46114
</Step>
47115
</Steps>
48116

agents-docs/package.json

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@
2222
},
2323
"dependencies": {
2424
"@inkeep/agents-cli": "workspace:*",
25+
"@inkeep/agents-core": "workspace:*",
2526
"@inkeep/agents-ui": "^0.15.5",
2627
"@inkeep/cxkit-react": "^0.5.98",
2728
"@inkeep/docskit": "^0.0.8",
@@ -38,6 +39,7 @@
3839
"fumadocs-typescript": "^4.0.13",
3940
"fumadocs-ui": "^16.1.0",
4041
"hast-util-to-jsx-runtime": "^2.3.6",
42+
"llm-info": "^1.0.69",
4143
"lucide-react": "^0.503.0",
4244
"next": "16.1.0",
4345
"posthog-js": "^1.308.0",
Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
import {
2+
ANTHROPIC_MODELS,
3+
GOOGLE_MODELS,
4+
OPENAI_MODELS,
5+
} from '@inkeep/agents-core/constants/models';
6+
import { ModelInfoMap } from 'llm-info';
7+
import { extractModelIdForLlmInfo } from '../../../agents-run-api/src/utils/model-context-utils';
8+
9+
// Select representative models from our supported set
10+
const FEATURED_MODELS = [
11+
OPENAI_MODELS.GPT_5_2,
12+
ANTHROPIC_MODELS.CLAUDE_SONNET_4_5,
13+
GOOGLE_MODELS.GEMINI_3_PRO_PREVIEW,
14+
] as const;
15+
16+
// Same compression logic as the runtime
17+
function getCompressionParams(contextWindow: number) {
18+
if (contextWindow < 100000) {
19+
return { threshold: 0.85, bufferPct: 0.1 }; // 75% trigger point
20+
}
21+
if (contextWindow < 500000) {
22+
return { threshold: 0.9, bufferPct: 0.07 }; // 83% trigger point
23+
}
24+
return { threshold: 0.91, bufferPct: 0.05 }; // 86% trigger point
25+
}
26+
27+
function formatTokens(tokens: number): string {
28+
if (tokens >= 1000000) {
29+
return `${(tokens / 1000000).toFixed(1).replace('.0', '')}M`;
30+
}
31+
if (tokens >= 1000) {
32+
return `${(tokens / 1000).toFixed(0)}K`;
33+
}
34+
return tokens.toString();
35+
}
36+
37+
export function CompressionModelsTable() {
38+
const rows = FEATURED_MODELS.map((modelString) => {
39+
const modelId = extractModelIdForLlmInfo(modelString);
40+
const modelDetails = ModelInfoMap[modelId as keyof typeof ModelInfoMap];
41+
42+
// Only use models that exist in llm-info
43+
if (!modelDetails?.contextWindowTokenLimit) {
44+
return null;
45+
}
46+
47+
const contextWindow = modelDetails.contextWindowTokenLimit;
48+
const conversationThreshold = Math.floor(contextWindow * 0.5);
49+
const params = getCompressionParams(contextWindow);
50+
const contextCompactingThreshold = Math.floor(contextWindow * params.threshold);
51+
const contextCompactingPct = Math.round(params.threshold * 100);
52+
53+
return {
54+
model: modelString,
55+
contextWindow,
56+
conversationThreshold,
57+
contextCompactingThreshold,
58+
contextCompactingPct,
59+
};
60+
}).filter((row): row is NonNullable<typeof row> => row !== null);
61+
62+
return (
63+
<div className="overflow-x-auto">
64+
<table className="min-w-full table-auto">
65+
<thead>
66+
<tr>
67+
<th>Model</th>
68+
<th>Context Window</th>
69+
<th>Conversation Threshold</th>
70+
<th>Context Compacting Threshold</th>
71+
</tr>
72+
</thead>
73+
<tbody>
74+
{rows.map((row) => (
75+
<tr key={row.model}>
76+
<td>{row.model}</td>
77+
<td>{formatTokens(row.contextWindow)} tokens</td>
78+
<td>{formatTokens(row.conversationThreshold)} (50%)</td>
79+
<td>
80+
~{formatTokens(row.contextCompactingThreshold)} ({row.contextCompactingPct}%)
81+
</td>
82+
</tr>
83+
))}
84+
</tbody>
85+
</table>
86+
</div>
87+
);
88+
}

agents-manage-api/__snapshots__/openapi.json

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5905,8 +5905,8 @@
59055905
},
59065906
"/oauth/callback": {
59075907
"get": {
5908-
"description": "Handles OAuth authorization codes and completes the authentication flow",
5909-
"operationId": "oauth-callback",
5908+
"description": "Handles OAuth authorization codes for MCP tools and completes the authentication flow",
5909+
"operationId": "mcp-oauth-callback",
59105910
"parameters": [
59115911
{
59125912
"in": "query",
@@ -5968,7 +5968,7 @@
59685968
"description": "Internal server error"
59695969
}
59705970
},
5971-
"summary": "OAuth authorization callback",
5971+
"summary": "MCP OAuth authorization callback",
59725972
"tags": [
59735973
"OAuth"
59745974
]

agents-manage-api/src/__tests__/data/ledgerArtifacts.test.ts

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -117,6 +117,7 @@ describe('Ledger Artifacts – Data Layer', () => {
117117
],
118118
taskId,
119119
metadata: { foo: 'bar' },
120+
createdAt: '2024-01-16T01:30:00.000Z',
120121
},
121122
{
122123
artifactId: generateId(),
@@ -130,6 +131,7 @@ describe('Ledger Artifacts – Data Layer', () => {
130131
],
131132
taskId,
132133
metadata: { baz: 'qux' },
134+
createdAt: '2024-01-16T02:30:00.000Z',
133135
},
134136
];
135137

@@ -173,6 +175,7 @@ describe('Ledger Artifacts – Data Layer', () => {
173175
},
174176
],
175177
taskId,
178+
createdAt: '2024-01-16T03:30:00.000Z',
176179
};
177180

178181
await addLedgerArtifacts(dbClient)({
@@ -222,7 +225,7 @@ describe('Ledger Artifacts – Data Layer', () => {
222225
// Intentionally passing an invalid param to trigger validation error
223226
// eslint-disable-next-line @typescript-eslint/no-unsafe-argument
224227
await expect(getLedgerArtifacts(dbClient)({} as any)).rejects.toThrow(
225-
'At least one of taskId, toolCallId, or artifactId must be provided'
228+
'At least one of taskId, toolCallId, toolCallIds, or artifactId must be provided'
226229
);
227230
});
228231

@@ -269,6 +272,7 @@ describe('Ledger Artifacts – Data Layer', () => {
269272
data: { secret: 'tenant1-secret' },
270273
},
271274
],
275+
createdAt: '2024-01-16T04:30:00.000Z',
272276
},
273277
],
274278
});
@@ -289,6 +293,7 @@ describe('Ledger Artifacts – Data Layer', () => {
289293
data: { secret: 'tenant2-secret' },
290294
},
291295
],
296+
createdAt: '2024-01-16T05:30:00.000Z',
292297
},
293298
],
294299
});

0 commit comments

Comments
 (0)