Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .changeset/advertise-honest-context-window.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"manifest": minor
---

Advertise an honest context window for `manifest/auto` via a new OpenAI-compatible `GET /v1/models` endpoint. The `context_length` returned is the minimum across every model the agent could be routed to (tier primaries, fallbacks, specificity overrides) — any routed model is guaranteed to accept at least that many tokens, so clients that compact against this value stop overflowing the routed model. Adds a per-agent **Context window** card in Settings to override the computed floor, and a helper endpoint `GET /api/v1/routing/:agentName/context-window` for the dashboard. Addresses #1617, #1612, and #1450.
5 changes: 5 additions & 0 deletions .changeset/context-aware-routing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"manifest": minor
---

Context-aware routing: the proxy now estimates token counts up front (js-tiktoken cl100k_base + 10% safety margin) and filters tier candidates by whether their context window can fit `estimatedTokens + max_tokens`. When no model in the scored tier fits, the router escalates upward (simple → standard → complex → reasoning) instead of silently routing to a too-small model. When no model in any tier can fit, the user gets an actionable error that breaks out input vs reserved-output rather than an opaque 400. Emits `X-Manifest-Context-Estimated`, `X-Manifest-Context-Used`, and `X-Manifest-Context-Escalated: <fromTier>-><toTier>` response headers so agents can adapt per-response. Addresses the #1617 RFC (context window awareness for agentic integrations) end-to-end.
8 changes: 6 additions & 2 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -333,6 +333,8 @@ All analytics queries filter by user via `addTenantFilter(qb, userId)` from `que
| POST | `/api/v1/routing/:agentName/ollama/sync` | Session/API Key | Sync Ollama models |
| POST | `/api/v1/routing/resolve` | Bearer (mnfst_*) | Model resolution |
| POST | `/v1/chat/completions` | Bearer (mnfst_*) | LLM proxy (OpenAI-compatible) |
| GET | `/v1/models` | Bearer (mnfst_*) | OpenAI-compatible model listing; advertises `manifest/auto` with the honest `context_length` floor for the agent (Phase 1, #1617) |
| GET | `/api/v1/routing/:agentName/context-window` | Session/API Key | Effective advertised context window + whether the user set an override (drives the Settings "Context window" card) |
| GET | `/api/v1/events` | Session | SSE real-time events |
| GET | `/api/v1/github/stars` | Public | GitHub star count |

Expand Down Expand Up @@ -363,7 +365,7 @@ See `packages/backend/.env.example` for all variables. Key ones:

## Domain Terminology

- **Message**: The primary entity in the system. Every row in `agent_messages` is a Message. The UI labels them "Messages" everywhere. Key routing columns: `routing_tier` (complexity tier used), `routing_reason` (why — `scored`, `specificity`, `heartbeat`, etc.), `specificity_category` (which task-type category, null if complexity-routed).
- **Message**: The primary entity in the system. Every row in `agent_messages` is a Message. The UI labels them "Messages" everywhere. Key routing columns: `routing_tier` (complexity tier used), `routing_reason` (why — `scored`, `specificity`, `heartbeat`, `size_escalated`, `context_window_exceeded`, …), `specificity_category` (which task-type category, null if complexity-routed).
- **Tenant**: A user's data boundary. Created from `user.id` on first agent creation.
- **Agent**: An AI agent owned by a tenant. Has a unique OTLP ingest key.

Expand Down Expand Up @@ -410,7 +412,9 @@ To add a new font or icon library:
- **LLM Routing**: Two-layer routing system with provider key management (AES-256-GCM encrypted) and OpenAI-compatible proxy at `/v1/chat/completions`:
- **Complexity tiers** (always active): 4 tiers (simple/standard/complex/reasoning) based on request content scoring with 23 weighted keyword dimensions.
- **Specificity routing** (opt-in): 9 task-type categories (coding, web_browsing, data_analysis, image_generation, video_generation, social_media, email_management, calendar_management, trading). When enabled, overrides complexity tiers. Detection uses keyword analysis on the last user message + tool name heuristics. Categories defined in `shared/src/specificity.ts`, keywords in `scoring/keywords.ts`, detection in `scoring/specificity-detector.ts`.
- **Resolution order**: specificity check (if any category active) → complexity scoring → tier assignment → provider/model resolution → proxy forward.
- **Context-aware size check** (Phase 2 — always active): before forwarding, the proxy estimates tokens for the full request via `routing/proxy/token-estimate.ts` (js-tiktoken `cl100k_base` + 1.1× safety multiplier). `ResolveService.resolveWithSizeCheck` then walks the scored tier's `(primary, ...fallbacks)` in order and picks the first candidate whose `contextWindow >= estimated + reservedOutput` (reservedOutput = `body.max_tokens ?? 4096`). If no model in the scored tier fits, escalate upward (`simple → standard → complex → reasoning`). If *nothing* fits in any tier, ResolveService returns `reason: 'context_window_exceeded'` and ProxyService surfaces a friendly chat-completion response with the estimated token count, largest available window, and a link to the Routing page — no silent truncation. Size-aware candidate filtering is the pure function `findFittingCandidate` in `routing/resolve/context-fit.ts`.
- **Resolution order**: specificity check (if any category active) → complexity scoring → tier assignment → **size-aware candidate pick within tier** → **tier escalation if no candidate fits** → provider/model resolution → proxy forward.
- **Size signals on the wire**: successful proxy responses emit `X-Manifest-Context-Estimated` and `X-Manifest-Context-Used`; when the size check bumped the tier, an additional `X-Manifest-Context-Escalated: <fromTier>-><toTier>` is emitted. Agents use these to adapt per-response without a second round trip (#1617). Header values are ASCII-only — Node's http layer silently drops headers containing Latin-1-incompatible characters.

## Providers & Models

Expand Down
10 changes: 10 additions & 0 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion packages/backend/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@
"migration:create": "typeorm-ts-node-commonjs migration:create"
},
"dependencies": {
"manifest-shared": "*",
"@nestjs/cache-manager": "^3.1.0",
"@nestjs/common": "^11.0.0",
"@nestjs/config": "^4.0.0",
Expand All @@ -38,6 +37,8 @@
"compression": "^1.8.1",
"express-rate-limit": "^8.3.1",
"helmet": "^8.1.0",
"js-tiktoken": "^1.0.21",
"manifest-shared": "*",
"pg": "^8.13.0",
"react": "^19.2.4",
"react-dom": "^19.2.4",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@ describe('AgentsController', () => {
let mockConfigGet: jest.Mock;
let mockDeleteAgent: jest.Mock;
let mockRenameAgent: jest.Mock;
let mockUpdateAgentType: jest.Mock;
let mockUpdateContextFloorOverride: jest.Mock;
let mockTenantResolve: jest.Mock;

beforeEach(async () => {
Expand All @@ -31,6 +33,8 @@ describe('AgentsController', () => {
mockConfigGet = jest.fn().mockReturnValue('');
mockDeleteAgent = jest.fn().mockResolvedValue(undefined);
mockRenameAgent = jest.fn().mockResolvedValue(undefined);
mockUpdateAgentType = jest.fn().mockResolvedValue(undefined);
mockUpdateContextFloorOverride = jest.fn().mockResolvedValue(undefined);
mockTenantResolve = jest.fn().mockResolvedValue('tenant-123');

const module: TestingModule = await Test.createTestingModule({
Expand All @@ -46,7 +50,8 @@ describe('AgentsController', () => {
useValue: {
deleteAgent: mockDeleteAgent,
renameAgent: mockRenameAgent,
updateAgentType: jest.fn(),
updateAgentType: mockUpdateAgentType,
updateContextFloorOverride: mockUpdateContextFloorOverride,
},
},
{
Expand Down Expand Up @@ -255,7 +260,6 @@ describe('AgentsController', () => {
});

it('passes category/platform to updateAgentType on PATCH', async () => {
const mockUpdateType = jest.fn().mockResolvedValue(undefined);
const user = { id: 'u1' };
const result = await controller.updateAgent(user as never, 'bot-1', {
agent_category: 'app',
Expand All @@ -268,6 +272,57 @@ describe('AgentsController', () => {
});
});

/**
* context_floor_override PATCH cases — the UI "Custom context window" card
* writes through this path. Defends issues #1617 / #1612 by making sure a
* user-entered override is actually persisted (not swallowed by the
* controller ignoring the field because the DTO treats it as optional).
*/
it('calls updateContextFloorOverride with a numeric value on PATCH', async () => {
const user = { id: 'u1' };
const result = await controller.updateAgent(user as never, 'bot-1', {
context_floor_override: 50_000,
} as never);

expect(mockUpdateContextFloorOverride).toHaveBeenCalledWith('u1', 'bot-1', 50_000);
expect(result).toEqual({ context_floor_override: 50_000 });
});

it('calls updateContextFloorOverride with null to clear the override', async () => {
const user = { id: 'u1' };
const result = await controller.updateAgent(user as never, 'bot-1', {
context_floor_override: null,
} as never);

expect(mockUpdateContextFloorOverride).toHaveBeenCalledWith('u1', 'bot-1', null);
expect(result).toEqual({ context_floor_override: null });
});

it('does not call updateContextFloorOverride when the field is omitted', async () => {
// The field is optional — PATCH { agent_category: 'app' } must not wipe
// any existing override.
const user = { id: 'u1' };
await controller.updateAgent(user as never, 'bot-1', {
agent_category: 'app',
} as never);

expect(mockUpdateContextFloorOverride).not.toHaveBeenCalled();
});

it('routes context_floor_override to the NEW slug when renaming in the same PATCH', async () => {
// Rename-and-set-override in one PATCH is a real path in the Settings
// page. After renaming to `new-slug`, the lifecycle call has to target
// the new name, not the original.
const user = { id: 'u1' };
await controller.updateAgent(user as never, 'bot-1', {
name: 'New Slug',
context_floor_override: 64_000,
} as never);

expect(mockRenameAgent).toHaveBeenCalledWith('u1', 'bot-1', 'new-slug', 'New Slug');
expect(mockUpdateContextFloorOverride).toHaveBeenCalledWith('u1', 'new-slug', 64_000);
});

it('invalidates agent list cache after successful createAgent', async () => {
const mockOnboard = jest.fn().mockResolvedValue({
tenantId: 't1',
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,15 @@ export class AgentsController {
if (body.agent_platform !== undefined) result['agent_platform'] = body.agent_platform;
}

if (body.context_floor_override !== undefined) {
await this.lifecycle.updateContextFloorOverride(
user.id,
body.name ? slugify(body.name)! : agentName,
body.context_floor_override,
);
result['context_floor_override'] = body.context_floor_override;
}

await this.cacheManager.del(this.agentListCacheKey(user.id));
return result;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,75 @@ describe('AgentLifecycleService', () => {
});
});

describe('updateContextFloorOverride', () => {
it('writes the override value on the agents row when the agent exists', async () => {
mockAgentGetOne.mockResolvedValueOnce({ id: 'agent-id-1', name: 'my-agent' });

const mockExecute = jest.fn().mockResolvedValue({});
const mockUpdateQb = {
update: jest.fn().mockReturnThis(),
set: jest.fn().mockReturnThis(),
where: jest.fn().mockReturnThis(),
execute: mockExecute,
};

mockAgentCreateQueryBuilder
.mockReturnValueOnce({
select: jest.fn().mockReturnThis(),
leftJoin: jest.fn().mockReturnThis(),
where: jest.fn().mockReturnThis(),
andWhere: jest.fn().mockReturnThis(),
orderBy: jest.fn().mockReturnThis(),
getOne: mockAgentGetOne,
getMany: jest.fn().mockResolvedValue([]),
})
.mockReturnValueOnce(mockUpdateQb);

await service.updateContextFloorOverride('test-user', 'my-agent', 50_000);

expect(mockUpdateQb.update).toHaveBeenCalledWith('agents');
expect(mockUpdateQb.set).toHaveBeenCalledWith({ context_floor_override: 50_000 });
expect(mockUpdateQb.where).toHaveBeenCalledWith('id = :id', { id: 'agent-id-1' });
expect(mockExecute).toHaveBeenCalledTimes(1);
});

it('writes null when clearing the override', async () => {
mockAgentGetOne.mockResolvedValueOnce({ id: 'agent-id-1', name: 'my-agent' });

const mockExecute = jest.fn().mockResolvedValue({});
const mockUpdateQb = {
update: jest.fn().mockReturnThis(),
set: jest.fn().mockReturnThis(),
where: jest.fn().mockReturnThis(),
execute: mockExecute,
};

mockAgentCreateQueryBuilder
.mockReturnValueOnce({
select: jest.fn().mockReturnThis(),
leftJoin: jest.fn().mockReturnThis(),
where: jest.fn().mockReturnThis(),
andWhere: jest.fn().mockReturnThis(),
orderBy: jest.fn().mockReturnThis(),
getOne: mockAgentGetOne,
getMany: jest.fn().mockResolvedValue([]),
})
.mockReturnValueOnce(mockUpdateQb);

await service.updateContextFloorOverride('test-user', 'my-agent', null);

expect(mockUpdateQb.set).toHaveBeenCalledWith({ context_floor_override: null });
});

it('throws NotFoundException when the agent does not belong to the user', async () => {
mockAgentGetOne.mockResolvedValueOnce(null);

await expect(
service.updateContextFloorOverride('test-user', 'nonexistent', 50_000),
).rejects.toThrow(NotFoundException);
});
});

describe('renameAgent', () => {
it('should throw NotFoundException when agent not found', async () => {
mockAgentGetOne.mockResolvedValueOnce(null);
Expand Down
16 changes: 16 additions & 0 deletions packages/backend/src/analytics/services/agent-lifecycle.service.ts
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,22 @@ export class AgentLifecycleService {
.execute();
}

async updateContextFloorOverride(
userId: string,
agentName: string,
value: number | null,
): Promise<void> {
const agent = await this.findAgentByUser(userId, agentName);
if (!agent) throw new NotFoundException(`Agent "${agentName}" not found`);

await this.agentRepo
.createQueryBuilder()
.update('agents')
.set({ context_floor_override: value })
.where('id = :id', { id: agent.id })
.execute();
}

async renameAgent(
userId: string,
currentName: string,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -356,5 +356,35 @@ describe('TimeseriesQueriesService', () => {
expect(result[0].total_cost).toBe(0);
expect(result[0].sparkline).toEqual([]);
});

/**
* The Settings page loads the current override from GET /api/v1/agents
* (to prefill the Auto/Custom radio). If this projection ever drops the
* field, the Settings page silently reverts to "Auto" every reload.
*/
it('surfaces a configured context_floor_override on each agent row', async () => {
mockGetMany.mockResolvedValueOnce([
{
name: 'custom-bot',
display_name: null,
created_at: '2026-02-16',
context_floor_override: 50_000,
},
]);
mockGetRawMany.mockResolvedValueOnce([]).mockResolvedValueOnce([]);

const result = await service.getAgentList('u1');
expect(result[0].context_floor_override).toBe(50_000);
});

it('defaults context_floor_override to null when the agent row has no override', async () => {
mockGetMany.mockResolvedValueOnce([
{ name: 'auto-bot', display_name: null, created_at: '2026-02-16' },
]);
mockGetRawMany.mockResolvedValueOnce([]).mockResolvedValueOnce([]);

const result = await service.getAgentList('u1');
expect(result[0].context_floor_override).toBeNull();
});
});
});
Original file line number Diff line number Diff line change
Expand Up @@ -230,6 +230,7 @@ export class TimeseriesQueriesService {
display_name: a.display_name ?? name,
agent_category: a.agent_category ?? null,
agent_platform: a.agent_platform ?? null,
context_floor_override: a.context_floor_override ?? null,
message_count: Number(stats?.['message_count'] ?? 0),
last_active: String(stats?.['last_active'] ?? a.created_at ?? ''),
total_cost: Number(stats?.['total_cost'] ?? 0),
Expand Down
Loading
Loading