inkeep
diff --git a/‎.changeset/true-insects-kick.md‎
Lines changed: 13 additions & 0 deletions b/‎.changeset/true-insects-kick.md‎
Lines changed: 13 additions & 0 deletions
diff --git a/‎agents-docs/content/typescript-sdk/memory.mdx‎
Lines changed: 74 additions & 6 deletions b/‎agents-docs/content/typescript-sdk/memory.mdx‎
Lines changed: 74 additions & 6 deletions
diff --git a/‎agents-docs/package.json‎
Lines changed: 2 additions & 0 deletions b/‎agents-docs/package.json‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎agents-docs/src/components/compression-models-table.tsx‎
Lines changed: 88 additions & 0 deletions b/‎agents-docs/src/components/compression-models-table.tsx‎
Lines changed: 88 additions & 0 deletions
diff --git a/‎agents-manage-api/__snapshots__/openapi.json‎
Lines changed: 3 additions & 3 deletions b/‎agents-manage-api/__snapshots__/openapi.json‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎agents-manage-api/src/__tests__/data/ledgerArtifacts.test.ts‎
Lines changed: 6 additions & 1 deletion b/‎agents-manage-api/src/__tests__/data/ledgerArtifacts.test.ts‎
Lines changed: 6 additions & 1 deletion
@@ -0,0 +1,13 @@
+---
+"@inkeep/agents-run-api": patch
+"@inkeep/agents-cli": patch
+"@inkeep/agents-manage-api": patch
+"@inkeep/agents-manage-ui": patch
+"@inkeep/agents-core": patch
+"@inkeep/agents-manage-mcp": patch
+"@inkeep/agents-sdk": patch
+"@inkeep/ai-sdk-provider": patch
+"@inkeep/create-agents": patch
+---
+
+Added Conversation COmpression
@@ -6,6 +6,8 @@ icon: LuBrain
 keywords: memory, context window, conversation history, message limits, token limits, delegation, tool results
 ---
 
+import { CompressionModelsTable } from '../../src/components/compression-models-table';
+
 ## Overview
 
 Conversation memory determines how much of the conversation history is included in the context window when your Agent processes a new message. The Inkeep Agent Framework automatically manages conversation history to balance context retention with token efficiency, with specialized handling for delegated agents and tool results.
@@ -18,31 +20,97 @@ The conversation history now includes:
 - **Tool results**: Results from tool executions, providing context about what actions were performed
 - **Agent communications**: Messages exchanged between agents during transfers and delegations
 
-## Default Limits
+## Memory Management
+
+The system uses two approaches for managing conversation history:
 
-By default, the system includes conversation history using these limits:
+### Intelligent Compression (Primary Method)
+When agents have a summarizer model configured (standard setup):
+- **Up to 10,000 messages**: Retrieves extensive conversation history to find compression summaries and make intelligent decisions
+- **No token limits**: Model-aware compression manages context based on each model's actual capabilities
+- **Dynamic optimization**: Automatically compresses when approaching model-specific thresholds (50% for conversation-level, 75-91% for sub-agent operations)
 
+### Fixed Limits (Fallback Method)
+For agents without a summarizer model:
 - **50 messages**: Up to the 50 most recent messages from the conversation
 - **8,000 tokens**: Maximum of 8,000 tokens from previous conversation messages
 
 <Note>
-  The 50-message and 8,000-token limits are the default values. The token limit can be adjusted via the `AGENTS_CONVERSATION_HISTORY_MAX_OUTPUT_TOKENS_DEFAULT` environment variable if needed.
+  Most agents use intelligent compression, which provides superior context management tailored to each model's capabilities. The fixed limits serve as safety nets when a summarizer model is not available.
+</Note>
+
+## Intelligent Compression System
+
+The framework's intelligent compression system is the primary method for managing conversation memory. It automatically analyzes model capabilities and compresses context when needed to optimize performance.
+
+### How Compression Works
+
+The compression system operates continuously, making intelligent decisions about context management:
+
+<Steps>
+  <Step>
+    **Context Monitoring**: System continuously monitors conversation size against model limits
+  </Step>
+  <Step>
+    **Automatic Triggering**: Compression triggers at 50% of context window for conversation-level, or at model-aware thresholds (~75-91% depending on model size) for sub-agent generation
+  </Step>
+  <Step>
+    **Tool Result Archiving**: Large tool results are stored as artifacts and replaced with summary references
+  </Step>
+  <Step>
+    **AI Summarization**: Older conversation parts are summarized by AI while preserving key context
+  </Step>
+  <Step>
+    **Fallback Protection**: If compression is unavailable, system falls back to fixed message and token limits
+  </Step>
+</Steps>
+
+### Model-Specific Behavior
+
+Different models have different context windows, and compression adapts accordingly:
+
+<CompressionModelsTable />
+
+### Compression Types
+
+#### Conversation-Level Compression
+- **Trigger**: When conversation reaches 50% of model's context window
+- **Action**: Compresses entire conversation history into summary + artifacts
+- **Use Case**: Long conversations with extensive history
+
+**Example**: You have a 20-message conversation about planning a software project. The conversation includes requirements gathering, architecture discussions, and code reviews. When it hits the 50% threshold, the system creates a summary like "User discussed project requirements for e-commerce platform, decided on microservices architecture, reviewed authentication flow..." and stores detailed tool outputs as artifacts.
+
+#### Sub-Agent Generation Compression  
+- **Trigger**: During sub-agent execution when tool results exceed model-aware limits (75-91% depending on model size)
+- **Action**: Compresses generated tool results while preserving original context
+- **Use Case**: Sub-agents performing many tool operations during generation
+
+**Example**: A sub-agent is tasked with "analyze this codebase for security issues." During execution, it uses tools to:
+1. Read 15 different files (large outputs)
+2. Run security scans (detailed reports) 
+3. Check dependencies (long lists)
+4. Analyze configurations (verbose JSON)
+
+When these tool results fill up the context window, the system compresses them into: "Analyzed 15 files, found 3 SQL injection risks in auth.py, 2 XSS vulnerabilities in templates..." while keeping the original conversation and task intact.
+
+<Note>
+  Compression happens automatically and transparently. Your agents will continue to work normally even with compressed conversations, as the system preserves all essential context and provides artifact references for detailed information.
 </Note>
 
 ## How It Works
 
 <Steps>
   <Step>
-    **Message Retrieval**: The system retrieves up to 50 most recent messages from the conversation history
+    **Message Retrieval**: The system retrieves conversation history (up to 10,000 messages with intelligent compression, or 50 messages with fixed limits)
   </Step>
   <Step>
     **Delegation Filtering**: Messages are filtered based on delegation context - delegated agents see their own tool results plus top-level conversation context
   </Step>
   <Step>
-    **Token Calculation**: Remaining messages are processed, calculating token count for each message
+    **Context Management**: With intelligent compression, the system analyzes model capabilities and compresses when needed. With fixed limits, messages are truncated at token thresholds.
   </Step>
   <Step>
-    **Exclusion**: If the total token count exceeds 4,000 tokens, older messages are excluded from the context window
+    **Optimization**: Intelligent compression creates summaries and artifacts to preserve essential context while staying within model limits
   </Step>
 </Steps>
 
 
@@ -22,6 +22,7 @@
   },
   "dependencies": {
     "@inkeep/agents-cli": "workspace:*",
+    "@inkeep/agents-core": "workspace:*",
     "@inkeep/agents-ui": "^0.15.5",
     "@inkeep/cxkit-react": "^0.5.98",
     "@inkeep/docskit": "^0.0.8",
@@ -38,6 +39,7 @@
     "fumadocs-typescript": "^4.0.13",
     "fumadocs-ui": "^16.1.0",
     "hast-util-to-jsx-runtime": "^2.3.6",
+    "llm-info": "^1.0.69",
     "lucide-react": "^0.503.0",
     "next": "16.1.0",
     "posthog-js": "^1.308.0",
 
@@ -0,0 +1,88 @@
+import {
+  ANTHROPIC_MODELS,
+  GOOGLE_MODELS,
+  OPENAI_MODELS,
+} from '@inkeep/agents-core/constants/models';
+import { ModelInfoMap } from 'llm-info';
+import { extractModelIdForLlmInfo } from '../../../agents-run-api/src/utils/model-context-utils';
+
+// Select representative models from our supported set
+const FEATURED_MODELS = [
+  OPENAI_MODELS.GPT_5_2,
+  ANTHROPIC_MODELS.CLAUDE_SONNET_4_5,
+  GOOGLE_MODELS.GEMINI_3_PRO_PREVIEW,
+] as const;
+
+// Same compression logic as the runtime
+function getCompressionParams(contextWindow: number) {
+  if (contextWindow < 100000) {
+    return { threshold: 0.85, bufferPct: 0.1 }; // 75% trigger point
+  }
+  if (contextWindow < 500000) {
+    return { threshold: 0.9, bufferPct: 0.07 }; // 83% trigger point
+  }
+  return { threshold: 0.91, bufferPct: 0.05 }; // 86% trigger point
+}
+
+function formatTokens(tokens: number): string {
+  if (tokens >= 1000000) {
+    return `${(tokens / 1000000).toFixed(1).replace('.0', '')}M`;
+  }
+  if (tokens >= 1000) {
+    return `${(tokens / 1000).toFixed(0)}K`;
+  }
+  return tokens.toString();
+}
+
+export function CompressionModelsTable() {
+  const rows = FEATURED_MODELS.map((modelString) => {
+    const modelId = extractModelIdForLlmInfo(modelString);
+    const modelDetails = ModelInfoMap[modelId as keyof typeof ModelInfoMap];
+
+    // Only use models that exist in llm-info
+    if (!modelDetails?.contextWindowTokenLimit) {
+      return null;
+    }
+
+    const contextWindow = modelDetails.contextWindowTokenLimit;
+    const conversationThreshold = Math.floor(contextWindow * 0.5);
+    const params = getCompressionParams(contextWindow);
+    const contextCompactingThreshold = Math.floor(contextWindow * params.threshold);
+    const contextCompactingPct = Math.round(params.threshold * 100);
+
+    return {
+      model: modelString,
+      contextWindow,
+      conversationThreshold,
+      contextCompactingThreshold,
+      contextCompactingPct,
+    };
+  }).filter((row): row is NonNullable<typeof row> => row !== null);
+
+  return (
+    <div className="overflow-x-auto">
+      <table className="min-w-full table-auto">
+        <thead>
+          <tr>
+            <th>Model</th>
+            <th>Context Window</th>
+            <th>Conversation Threshold</th>
+            <th>Context Compacting Threshold</th>
+          </tr>
+        </thead>
+        <tbody>
+          {rows.map((row) => (
+            <tr key={row.model}>
+              <td>{row.model}</td>
+              <td>{formatTokens(row.contextWindow)} tokens</td>
+              <td>{formatTokens(row.conversationThreshold)} (50%)</td>
+              <td>
+                ~{formatTokens(row.contextCompactingThreshold)} ({row.contextCompactingPct}%)
+              </td>
+            </tr>
+          ))}
+        </tbody>
+      </table>
+    </div>
+  );
+}
@@ -5905,8 +5905,8 @@
     },
     "/oauth/callback": {
       "get": {
-        "description": "Handles OAuth authorization codes and completes the authentication flow",
-        "operationId": "oauth-callback",
+        "description": "Handles OAuth authorization codes for MCP tools and completes the authentication flow",
+        "operationId": "mcp-oauth-callback",
         "parameters": [
           {
             "in": "query",
@@ -5968,7 +5968,7 @@
             "description": "Internal server error"
           }
         },
-        "summary": "OAuth authorization callback",
+        "summary": "MCP OAuth authorization callback",
         "tags": [
           "OAuth"
         ]
 
@@ -117,6 +117,7 @@ describe('Ledger Artifacts – Data Layer', () => {
         ],
         taskId,
         metadata: { foo: 'bar' },
+        createdAt: '2024-01-16T01:30:00.000Z',
       },
       {
         artifactId: generateId(),
@@ -130,6 +131,7 @@ describe('Ledger Artifacts – Data Layer', () => {
         ],
         taskId,
         metadata: { baz: 'qux' },
+        createdAt: '2024-01-16T02:30:00.000Z',
       },
     ];
 
@@ -173,6 +175,7 @@ describe('Ledger Artifacts – Data Layer', () => {
         },
       ],
       taskId,
+      createdAt: '2024-01-16T03:30:00.000Z',
     };
 
     await addLedgerArtifacts(dbClient)({
@@ -222,7 +225,7 @@ describe('Ledger Artifacts – Data Layer', () => {
     // Intentionally passing an invalid param to trigger validation error
     // eslint-disable-next-line @typescript-eslint/no-unsafe-argument
     await expect(getLedgerArtifacts(dbClient)({} as any)).rejects.toThrow(
-      'At least one of taskId, toolCallId, or artifactId must be provided'
+      'At least one of taskId, toolCallId, toolCallIds, or artifactId must be provided'
     );
   });
 
@@ -269,6 +272,7 @@ describe('Ledger Artifacts – Data Layer', () => {
               data: { secret: 'tenant1-secret' },
             },
           ],
+          createdAt: '2024-01-16T04:30:00.000Z',
         },
       ],
     });
@@ -289,6 +293,7 @@ describe('Ledger Artifacts – Data Layer', () => {
               data: { secret: 'tenant2-secret' },
             },
           ],
+          createdAt: '2024-01-16T05:30:00.000Z',
         },
       ],
     });
Original file line number	Diff line number	Diff line change
`@@ -5905,8 +5905,8 @@`
`5905`	`5905`	`},`
`5906`	`5906`	`"/oauth/callback": {`
`5907`	`5907`	`"get": {`
`5908`		`- "description": "Handles OAuth authorization codes and completes the authentication flow",`
`5909`		`- "operationId": "oauth-callback",`
	`5908`	`+ "description": "Handles OAuth authorization codes for MCP tools and completes the authentication flow",`
	`5909`	`+ "operationId": "mcp-oauth-callback",`
`5910`	`5910`	`"parameters": [`
`5911`	`5911`	`{`
`5912`	`5912`	`"in": "query",`
`@@ -5968,7 +5968,7 @@`
`5968`	`5968`	`"description": "Internal server error"`
`5969`	`5969`	`}`
`5970`	`5970`	`},`
`5971`		`- "summary": "OAuth authorization callback",`
	`5971`	`+ "summary": "MCP OAuth authorization callback",`
`5972`	`5972`	`"tags": [`
`5973`	`5973`	`"OAuth"`
`5974`	`5974`	`]`