Skip to content

feat(e001-s003): trace store with TTL/capacity and tracing metrics#8

Merged
Dumidu1212 merged 2 commits intomainfrom
feat/e001-s003-tracing
Oct 14, 2025
Merged

feat(e001-s003): trace store with TTL/capacity and tracing metrics#8
Dumidu1212 merged 2 commits intomainfrom
feat/e001-s003-tracing

Conversation

@Dumidu1212
Copy link
Owner

@Dumidu1212 Dumidu1212 commented Oct 14, 2025

Summary by CodeRabbit

  • New Features

    • Plan responses now include a trace ID.
    • Added a new endpoint to retrieve execution traces by ID.
    • Enhanced planning flow with candidate scoring, better error handling, timeouts, and clearer failure outcomes.
  • Documentation

    • OpenAPI specification version updated to 0.3.0.

@coderabbitai
Copy link

coderabbitai bot commented Oct 14, 2025

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title clearly describes the main enhancement—introducing a TTL/capacity‐bounded trace store along with tracing metrics—and uses a conventional commit style. It is concise, avoids vague terms or file lists, and accurately reflects the core feature added across multiple modules. This phrasing enables teammates to quickly understand the primary change at a glance.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/e001-s003-tracing

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
src/tracing/traceStore.ts (1)

91-92: Consider removing the module anchor.

The __traceStoreModule export is described as an "anchor to keep the file as a module in all bundlers," but this is typically unnecessary in modern TypeScript/JavaScript projects. The presence of export statements (interfaces and class) already makes this a module.

If you're not experiencing specific bundler issues, consider removing this export:

-// Anchor to keep the file as a module in all bundlers.
-export const __traceStoreModule = true;

If you do need it for specific bundler compatibility, please document which bundler requires this and why.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 082b669 and 045258c.

📒 Files selected for processing (3)
  • src/metrics/metrics.ts (1 hunks)
  • src/tracing/traceStore.ts (2 hunks)
  • tests/unit/trace.store.test.ts (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/unit/trace.store.test.ts (1)
src/tracing/traceStore.ts (1)
  • TraceStore (24-89)
🔇 Additional comments (9)
src/metrics/metrics.ts (1)

58-59: LGTM!

The metrics are correctly registered with the Prometheus registry.

tests/unit/trace.store.test.ts (2)

13-21: LGTM!

The capacity eviction test correctly verifies that when maxTraces=2 and three traces are created, the oldest trace (a) is evicted while b and c remain.


23-29: LGTM!

The TTL expiration test correctly verifies that traces expire after the configured TTL period. The use of async/await with setTimeout is appropriate for this time-based test.

src/tracing/traceStore.ts (6)

1-4: LGTM!

The file header comment clearly describes the purpose of the trace store with TTL and max-size bounds, and references the explainability requirement (S003).


5-22: LGTM!

The interfaces are well-structured with clear comments. The addition of createdAt to the Trace interface and the TraceStoreOptions interface with maxTraces and ttlMs fields properly support TTL and capacity management.


30-34: LGTM!

The constructor properly initializes bounds with sensible defaults (1000 traces, 15 minutes TTL) and guards against invalid values using Math.max(1, ...).


36-43: Verify ID uniqueness under high concurrency.

The ID generation uses Math.random() and Date.now(), which provides reasonable uniqueness for typical use cases. However, under high concurrency (multiple traces created in rapid succession), there's a small risk of collisions since Date.now() has millisecond precision.

Consider whether your trace creation rate could exceed 1000 traces/millisecond. If so, consider adding a sequence counter or using a more robust ID generation library like uuid.

// Alternative with uuid (if needed):
import { v4 as uuidv4 } from 'uuid';

create(): string {
  const id = `tr_${uuidv4()}`;
  // ... rest of implementation
}

51-59: LGTM!

The get() method correctly checks for expiration and deletes expired traces before returning undefined. This lazy expiration approach is efficient and ensures stale traces are cleaned up on access.


61-63: LGTM!

The helper methods (size(), delete(), isExpired()) are simple, correct, and well-implemented.

Also applies to: 80-88

Comment on lines +42 to +50
/** Tracing metrics */
export const traceCreatedTotal = new client.Counter({
name: 'trace_created_total',
help: 'Number of traces created'
});
export const traceEventsTotal = new client.Counter({
name: 'trace_events_total',
help: 'Number of trace events recorded'
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Metrics are defined but never incremented.

The traceCreatedTotal and traceEventsTotal counters are properly defined but are not being incremented anywhere in the codebase. Specifically:

  • traceCreatedTotal should be incremented in TraceStore.create() (line 36 of traceStore.ts)
  • traceEventsTotal should be incremented in TraceStore.record() (line 45 of traceStore.ts)

Without these increments, the metrics will always report zero, defeating their purpose.

Apply these changes to increment the metrics:

In src/tracing/traceStore.ts, import the metrics:

+import { traceCreatedTotal, traceEventsTotal } from '../metrics/metrics';

Then increment in create():

 create(): string {
   const id = `tr_${Math.random().toString(36).slice(2)}${Date.now().toString(36)}`;
   const t: Trace = { id, createdAt: Date.now(), events: [] };
   this.traces.set(id, t);
   this.order.push(id);
+  traceCreatedTotal.inc();
   this.prune();
   return id;
 }

And increment in record():

 record(id: string, type: string, data: unknown): void {
   const t = this.traces.get(id);
   if (!t) return;
   t.events.push({ ts: Date.now(), type, data });
+  traceEventsTotal.inc();
 }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/tracing/traceStore.ts around line 36 and line 45, import the counters
from src/metrics/metrics.ts at the top of the file, then call
traceCreatedTotal.inc() inside TraceStore.create() at ~line 36 to increment the
created-trace metric, and call traceEventsTotal.inc() inside TraceStore.record()
at ~line 45 to increment the trace-events metric (use .inc(1) or .inc() as
appropriate for your metrics client).

Comment on lines +65 to +78
/** Remove expired and overflow traces. */
prune(): void {
// TTL prune
for (const id of [...this.order]) {
const t = this.traces.get(id);
if (!t) continue;
if (this.isExpired(t)) this.delete(id);
}
// Capacity prune (remove oldest)
while (this.order.length > this.maxTraces) {
const oldest = this.order.shift();
if (oldest) this.traces.delete(oldest);
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix inconsistent deletion logic in capacity pruning.

The prune() method has an inconsistency between TTL pruning and capacity pruning:

  • TTL pruning (line 71): calls this.delete(id), which removes the trace from both traces map AND order array
  • Capacity pruning (lines 75-76): only shifts from order and deletes from traces directly, without using the delete() helper

While this currently works, it's inconsistent and error-prone. If delete() logic changes in the future (e.g., adding metrics, logging, or cleanup), capacity pruning would miss those changes.

Apply this diff to use the consistent delete() helper:

   // Capacity prune (remove oldest)
   while (this.order.length > this.maxTraces) {
     const oldest = this.order.shift();
-    if (oldest) this.traces.delete(oldest);
+    if (oldest) this.delete(oldest);
   }

Note: You'll also need to adjust delete() to not call splice when the id is no longer in the order array (since shift() already removed it):

 private delete(id: string): void {
   this.traces.delete(id);
   const idx = this.order.indexOf(id);
   if (idx >= 0) this.order.splice(idx, 1);
 }

Or better yet, restructure capacity pruning to not shift first:

   // Capacity prune (remove oldest)
   while (this.order.length > this.maxTraces) {
-    const oldest = this.order.shift();
-    if (oldest) this.delete(oldest);
+    const oldest = this.order[0];
+    if (oldest) this.delete(oldest);
   }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/** Remove expired and overflow traces. */
prune(): void {
// TTL prune
for (const id of [...this.order]) {
const t = this.traces.get(id);
if (!t) continue;
if (this.isExpired(t)) this.delete(id);
}
// Capacity prune (remove oldest)
while (this.order.length > this.maxTraces) {
const oldest = this.order.shift();
if (oldest) this.traces.delete(oldest);
}
}
/** Remove expired and overflow traces. */
prune(): void {
// TTL prune
for (const id of [...this.order]) {
const t = this.traces.get(id);
if (!t) continue;
if (this.isExpired(t)) this.delete(id);
}
// Capacity prune (remove oldest)
while (this.order.length > this.maxTraces) {
const oldest = this.order.shift();
if (oldest) this.delete(oldest);
}
}
🤖 Prompt for AI Agents
In src/tracing/traceStore.ts around lines 65 to 78, capacity pruning is
inconsistent with TTL pruning because it shifts the order array and directly
deletes from the traces map instead of calling the delete() helper; change the
capacity loop to repeatedly read the oldest id without mutating first (e.g.,
const oldest = this.order[0]; if (oldest) this.delete(oldest); ) so delete()
runs the centralized cleanup, and update delete() to tolerate missing ids in the
order array (check for index >= 0 before splicing) so calling delete() is safe
when the id might already have been removed.

Comment on lines +4 to +11
it('creates, records, and fetches traces', () => {
const store = new TraceStore({ maxTraces: 2, ttlMs: 60_000 });
const id = store.create();
store.record(id, 'request', { a: 1 });
const t = store.get(id);
expect(t?.id).toBe(id);
expect(t?.events.length).toBe(1);
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add verification that metrics are incremented.

Since this PR introduces tracing metrics (traceCreatedTotal and traceEventsTotal), the tests should verify that these metrics are incremented correctly when traces are created and events are recorded.

Add this test to verify metrics:

import { traceCreatedTotal, traceEventsTotal } from '../../src/metrics/metrics';

it('increments metrics on create and record', () => {
  const store = new TraceStore({ maxTraces: 10, ttlMs: 60_000 });
  
  const beforeCreate = traceCreatedTotal['hashMap'][''].value;
  const beforeEvents = traceEventsTotal['hashMap'][''].value;
  
  const id = store.create();
  store.record(id, 'request', { a: 1 });
  store.record(id, 'response', { b: 2 });
  
  const afterCreate = traceCreatedTotal['hashMap'][''].value;
  const afterEvents = traceEventsTotal['hashMap'][''].value;
  
  expect(afterCreate).toBe(beforeCreate + 1);
  expect(afterEvents).toBe(beforeEvents + 2);
});
🤖 Prompt for AI Agents
In tests/unit/trace.store.test.ts around lines 4 to 11, the test exercises
TraceStore but does not assert the new tracing metrics; add a new test that
imports traceCreatedTotal and traceEventsTotal from src/metrics/metrics,
captures the current metric values, performs create() and two record() calls,
then re-reads the metric values and asserts traceCreatedTotal increased by 1 and
traceEventsTotal increased by 2 (use the same internal value access pattern used
elsewhere, e.g. reading the metric hashMap value key).

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (13)
src/metrics/metrics.ts (1)

42-50: Add an eviction counter to observe TTL vs capacity removals

To make the new store observable, add a counter with a reason label and register it.

 /** Tracing metrics */
 export const traceCreatedTotal = new client.Counter({
   name: 'trace_created_total',
   help: 'Number of traces created'
 });
 export const traceEventsTotal = new client.Counter({
   name: 'trace_events_total',
   help: 'Number of trace events recorded'
 });
+export const traceEvictionsTotal = new client.Counter({
+  name: 'trace_evictions_total',
+  help: 'Number of traces evicted',
+  labelNames: ['reason'] as const // 'ttl' | 'capacity'
+});

 register.registerMetric(toolsLoaded);
 ...
 register.registerMetric(plannerExecutionLatencyMs);
 register.registerMetric(traceCreatedTotal);
 register.registerMetric(traceEventsTotal);
+register.registerMetric(traceEvictionsTotal);

Also applies to: 58-59

src/tracing/traceStore.ts (5)

30-34: Config semantics: consider allowing “unbounded/disabled” values

Clamping to 1 forbids “no TTL” or “unlimited capacity”. Consider:

  • maxTraces: 0 → unlimited (Infinity)
  • ttlMs: 0 or negative → no expiry

This gives you dev/test flexibility and clearer intent.


36-43: Use crypto.randomUUID for IDs

Stronger uniqueness and unpredictability than Math.random + Date.now.

-  create(): string {
-    const id = `tr_${Math.random().toString(36).slice(2)}${Date.now().toString(36)}`;
+  create(): string {
+    const id = `tr_${randomUUID()}`;
     const t: Trace = { id, createdAt: Date.now(), events: [] };
     this.traces.set(id, t);
     this.order.push(id);
     this.prune();
     return id;
   }

Add at top of file:

import { randomUUID } from 'node:crypto';

45-49: Don’t record to expired traces; cheap cleanup

Guard against appending to expired traces to keep invariants and memory bounded between create() calls.

  record(id: string, type: string, data: unknown): void {
-    const t = this.traces.get(id);
-    if (!t) return;
-    t.events.push({ ts: Date.now(), type, data });
+    const t = this.traces.get(id);
+    if (!t) return;
+    if (this.isExpired(t)) {
+      this.delete(id);
+      return;
+    }
+    t.events.push({ ts: Date.now(), type, data });
  }

51-59: Return an immutable copy to avoid external mutation

get() returns internal state by reference; callers can mutate the store. Consider returning a copy.

-    return t;
+    return {
+      id: t.id,
+      createdAt: t.createdAt,
+      events: t.events.map((e) => ({ ...e })),
+    };

65-78: Instrument evictions and prefer delete() for consistency

  • Increment eviction metrics for TTL vs capacity.
  • Use delete() in capacity prune to keep invariants consistent.
   prune(): void {
     // TTL prune
     for (const id of [...this.order]) {
       const t = this.traces.get(id);
       if (!t) continue;
-      if (this.isExpired(t)) this.delete(id);
+      if (this.isExpired(t)) {
+        this.delete(id);
+        // traceEvictionsTotal.labels({ reason: 'ttl' }).inc();
+      }
     }
     // Capacity prune (remove oldest)
     while (this.order.length > this.maxTraces) {
       const oldest = this.order.shift();
-      if (oldest) this.traces.delete(oldest);
+      if (oldest) {
+        this.delete(oldest);
+        // traceEvictionsTotal.labels({ reason: 'capacity' }).inc();
+      }
     }
   }

Add (outside these lines) an import of the metric if you adopt instrumentation:

import { traceEvictionsTotal } from '../metrics/metrics';
tests/unit/trace.store.test.ts (1)

23-29: Reduce timing flake in TTL test

Sleep margin is tight (15ms for 10ms TTL). CI jitter can cause flakes. Increase the wait or use fake timers.

-    await new Promise((r) => setTimeout(r, 15));
+    await new Promise((r) => setTimeout(r, 30));
src/server.ts (1)

27-30: Make trace TTL and capacity configurable

Hard‑coding makes ops tuning harder. Read from config/env with sane defaults.

-  const traces = new TraceStore({ maxTraces: 2000, ttlMs: 15 * 60_000 }); // 15 minutes TTL
+  const traces = new TraceStore({
+    maxTraces: Number(process.env.TRACE_MAX_TRACES ?? 2000),
+    ttlMs: Number(process.env.TRACE_TTL_MS ?? 15 * 60_000),
+  }); // 15 minutes default
src/routes/trace.ts (1)

30-33: Broaden event.data schema to accept any JSON value

record() accepts unknown, but response schema only allows objects. Broaden to avoid schema/serialization mismatches.

-                    data: { type: 'object' }
+                    data: {
+                      anyOf: [
+                        { type: 'object' },
+                        { type: 'array' },
+                        { type: 'string' },
+                        { type: 'number' },
+                        { type: 'boolean' },
+                        { type: 'null' }
+                      ]
+                    }
tests/e2e/trace.e2e.test.ts (1)

18-22: Align StubExec signature with interface to avoid config-specific TS errors.

Some tsconfigs enforce strict method compatibility. Add the unused overallAbort parameter.

-class StubExec implements IToolExecutor {
-  async execute(_tool: Tool, _input: JsonRecord): Promise<ExecutionResult> {
+class StubExec implements IToolExecutor {
+  async execute(_tool: Tool, _input: JsonRecord, _overallAbort: AbortSignal): Promise<ExecutionResult> {
     return { status: 'success', latency_ms: 12, output: { ok: true } };
   }
 }
src/planner/planner.ts (3)

34-34: Import TraceStore as a type-only import.

Avoids unnecessary runtime dependency.

-import { TraceStore } from '../tracing/traceStore';
+import type { TraceStore } from '../tracing/traceStore';

72-76: Consider sanitizing the recorded request to reduce PII risk.

Recording the entire ctx may capture sensitive fields in traces. Record only whitelisted keys (capability, execute, timeout_ms) or scrub context/input per policy.


210-214: Tighten rec data type to align with route schema (data is an object).

Route schema declares event.data as an object. Restricting data to JsonRecord helps prevent accidental non-object payloads.

-  /** Record a trace event and increment the trace event counter. */
-  private rec(traceId: string, type: string, data: unknown): void {
+  /** Record a trace event and increment the trace event counter. */
+  private rec(traceId: string, type: string, data: JsonRecord): void {
     this.traces.record(traceId, type, data);
     traceEventsTotal.inc();
   }

If desired, also update TraceStore.record(id, type, data) to accept JsonRecord for full type safety.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 082b669 and c1922e4.

📒 Files selected for processing (8)
  • src/app.ts (1 hunks)
  • src/metrics/metrics.ts (1 hunks)
  • src/planner/planner.ts (2 hunks)
  • src/routes/trace.ts (1 hunks)
  • src/server.ts (1 hunks)
  • src/tracing/traceStore.ts (2 hunks)
  • tests/e2e/trace.e2e.test.ts (1 hunks)
  • tests/unit/trace.store.test.ts (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (6)
src/planner/planner.ts (3)
src/planner/contracts.ts (4)
  • PlanContext (10-23)
  • PlanResult (40-46)
  • ExecutionResult (32-37)
  • JsonRecord (7-7)
src/executors/httpExecutor.ts (1)
  • execute (9-59)
src/metrics/metrics.ts (6)
  • traceCreatedTotal (43-46)
  • plannerBidsTotal (14-18)
  • plannerSelectionTotal (21-25)
  • plannerExecutionLatencyMs (35-40)
  • plannerFallbacksTotal (28-32)
  • traceEventsTotal (47-50)
tests/unit/trace.store.test.ts (1)
src/tracing/traceStore.ts (1)
  • TraceStore (24-89)
tests/e2e/trace.e2e.test.ts (4)
src/planner/contracts.ts (3)
  • IToolExecutor (54-60)
  • JsonRecord (7-7)
  • ExecutionResult (32-37)
src/app.ts (1)
  • buildApp (14-31)
src/tracing/traceStore.ts (1)
  • TraceStore (24-89)
src/planner/planner.ts (1)
  • Planner (53-215)
src/routes/trace.ts (1)
src/tracing/traceStore.ts (1)
  • TraceStore (24-89)
src/server.ts (3)
src/tracing/traceStore.ts (1)
  • TraceStore (24-89)
src/planner/planner.ts (1)
  • Planner (53-215)
src/app.ts (1)
  • buildApp (14-31)
src/app.ts (2)
src/tracing/traceStore.ts (1)
  • TraceStore (24-89)
src/routes/trace.ts (1)
  • traceRoutes (6-55)
🪛 GitHub Actions: CI
src/app.ts

[error] 14-14: buildApp signature mismatch: 'traces: TraceStore' parameter is required but not provided in the test invocation.

🔇 Additional comments (4)
src/metrics/metrics.ts (1)

42-50: Tracing counters look good

Counters are well‑named and registered; integrates cleanly with existing registry.

src/app.ts (2)

26-26: Trace route wiring looks correct.

traceRoutes(instance, { traces: deps.traces }) ensures /trace/:id uses the same store as the planner.


8-8: Imports and swagger config

  • Type-only import for TraceStore is correct; avoids runtime import.
  • traceRoutes import and swagger info version bump are fine.

Also applies to: 11-11, 20-20

tests/e2e/trace.e2e.test.ts (1)

43-58: E2E flow validates tracing end-to-end.

Good assertions: 200s, matching traceId, and presence of key event types.

Comment on lines +13 to 15
/** Build the Fastify application with schemas, routes, and Swagger UI. */
export function buildApp(deps: { registry: IRegistryService; planner: IPlanner; traces: TraceStore }): FastifyInstance {
const app = Fastify({ logger: true });
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Breaking change: buildApp now requires traces; update all call sites (pipeline failing).

Tests/consumers still call buildApp({ registry, planner }), causing CI failure. Pass the same TraceStore instance used by Planner into buildApp to keep routes and planner in sync.

Run to find outdated usages:


🏁 Script executed:

#!/bin/bash
# Find buildApp calls and show nearby object literal without 'traces'
rg -nP -C3 '\bbuildApp\s*\(\s*\{' \
  | awk '
    BEGIN { file=""; }
    /^[-]{2}/ { print; next }
    /^[^:]+:\d+:/ {
      split($0, a, ":"); curfile=a[1]; print $0; next
    }
    { print }
  ' \
  | rg -nP -C0 '(?s)^.*$'  # Display as-is; visually inspect blocks for missing "traces:"

Length of output: 1943


Include traces in buildApp call sites
tests/e2e/tools.e2e.test.ts:29 and tests/e2e/plan.e2e.test.ts:45 still call buildApp({ registry, planner }). Update to pass the same traces instance used when constructing the planner, e.g.:

const traces = new TraceStore(/*…*/);
const planner = new Planner(registry, scorer, executor, traces);
app = buildApp({ registry, planner, traces });
🧰 Tools
🪛 GitHub Actions: CI

[error] 14-14: buildApp signature mismatch: 'traces: TraceStore' parameter is required but not provided in the test invocation.

🤖 Prompt for AI Agents
In src/app.ts around lines 13 to 15, buildApp now requires a traces parameter
but existing E2E test call sites still pass only { registry, planner }; update
tests/e2e/tools.e2e.test.ts (line ~29) and tests/e2e/plan.e2e.test.ts (line ~45)
to create or reuse the same TraceStore instance used to construct the Planner
and pass it into buildApp, e.g. instantiate const traces = new TraceStore(...)
before constructing Planner, use that traces when new Planner(..., traces), and
call buildApp({ registry, planner, traces }) so the traces instance is
consistent across planner and app.

Comment on lines +6 to +55
export async function traceRoutes(app: FastifyInstance, deps: { traces: TraceStore }): Promise<void> {
app.get(
'/trace/:id',
{
schema: {
description: 'Fetch a full decision/execution trace by ID',
params: {
type: 'object',
required: ['id'],
properties: { id: { type: 'string' } }
},
response: {
200: {
type: 'object',
required: ['id', 'createdAt', 'events'],
properties: {
id: { type: 'string' },
createdAt: { type: 'number' },
events: {
type: 'array',
items: {
type: 'object',
required: ['ts', 'type'],
properties: {
ts: { type: 'number' },
type: { type: 'string' },
data: { type: 'object' }
},
additionalProperties: false
}
}
},
additionalProperties: false
},
404: {
type: 'object',
required: ['message'],
properties: { message: { type: 'string' } },
additionalProperties: false
}
}
}
},
async (req: FastifyRequest<{ Params: Params }>, reply) => {
const t = deps.traces.get(req.params.id);
if (!t) return reply.code(404).send({ message: 'Trace not found or expired' });
return reply.code(200).send(t);
}
);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Secure the trace endpoint (auth/gating + no-store + redaction)

Trace payloads often include request inputs and errors; exposing them unauthenticated risks PII/secret leakage. At minimum:

  • Gate behind a feature flag or require auth (API key/JWT).
  • Add Cache-Control: no-store on responses.
  • Consider redacting sensitive fields before returning.
-    async (req: FastifyRequest<{ Params: Params }>, reply) => {
+    async (req: FastifyRequest<{ Params: Params }>, reply) => {
+      // Optional: enforce a simple guard; replace with your auth as needed.
+      // if (process.env.TRACE_ROUTE_ENABLED !== '1') return reply.code(404).send({ message: 'Not found' });
       const t = deps.traces.get(req.params.id);
-      if (!t) return reply.code(404).send({ message: 'Trace not found or expired' });
-      return reply.code(200).send(t);
+      if (!t) return reply.code(404).send({ message: 'Trace not found or expired' });
+      reply.header('Cache-Control', 'no-store');
+      return reply.code(200).send(t);
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export async function traceRoutes(app: FastifyInstance, deps: { traces: TraceStore }): Promise<void> {
app.get(
'/trace/:id',
{
schema: {
description: 'Fetch a full decision/execution trace by ID',
params: {
type: 'object',
required: ['id'],
properties: { id: { type: 'string' } }
},
response: {
200: {
type: 'object',
required: ['id', 'createdAt', 'events'],
properties: {
id: { type: 'string' },
createdAt: { type: 'number' },
events: {
type: 'array',
items: {
type: 'object',
required: ['ts', 'type'],
properties: {
ts: { type: 'number' },
type: { type: 'string' },
data: { type: 'object' }
},
additionalProperties: false
}
}
},
additionalProperties: false
},
404: {
type: 'object',
required: ['message'],
properties: { message: { type: 'string' } },
additionalProperties: false
}
}
}
},
async (req: FastifyRequest<{ Params: Params }>, reply) => {
const t = deps.traces.get(req.params.id);
if (!t) return reply.code(404).send({ message: 'Trace not found or expired' });
return reply.code(200).send(t);
}
);
}
export async function traceRoutes(app: FastifyInstance, deps: { traces: TraceStore }): Promise<void> {
app.get(
'/trace/:id',
{
schema: {
description: 'Fetch a full decision/execution trace by ID',
params: {
type: 'object',
required: ['id'],
properties: { id: { type: 'string' } }
},
response: {
200: {
type: 'object',
required: ['id', 'createdAt', 'events'],
properties: {
id: { type: 'string' },
createdAt: { type: 'number' },
events: {
type: 'array',
items: {
type: 'object',
required: ['ts', 'type'],
properties: {
ts: { type: 'number' },
type: { type: 'string' },
data: { type: 'object' }
},
additionalProperties: false
}
}
},
additionalProperties: false
},
404: {
type: 'object',
required: ['message'],
properties: { message: { type: 'string' } },
additionalProperties: false
}
}
}
},
async (req: FastifyRequest<{ Params: Params }>, reply) => {
// Optional: enforce a simple guard; replace with your auth as needed.
// if (process.env.TRACE_ROUTE_ENABLED !== '1') return reply.code(404).send({ message: 'Not found' });
const t = deps.traces.get(req.params.id);
if (!t) return reply.code(404).send({ message: 'Trace not found or expired' });
reply.header('Cache-Control', 'no-store');
return reply.code(200).send(t);
}
);
}

@Dumidu1212 Dumidu1212 merged commit a729cc3 into main Oct 14, 2025
1 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant