Skip to content

V4 to V5 signoz migration#2774

Open
shagun-singh-inkeep wants to merge 10 commits intomainfrom
v5
Open

V4 to V5 signoz migration#2774
shagun-singh-inkeep wants to merge 10 commits intomainfrom
v5

Conversation

@shagun-singh-inkeep
Copy link
Collaborator

No description provided.

@vercel
Copy link

vercel bot commented Mar 19, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agents-api Ready Ready Preview, Comment Mar 19, 2026 10:43pm
agents-docs Ready Ready Preview, Comment Mar 19, 2026 10:43pm
agents-manage-ui Ready Ready Preview, Comment Mar 19, 2026 10:43pm

Request Review

@changeset-bot
Copy link

changeset-bot bot commented Mar 19, 2026

🦋 Changeset detected

Latest commit: 74b0622

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 10 packages
Name Type
@inkeep/agents-core Patch
@inkeep/agents-manage-ui Patch
@inkeep/agents-api Patch
@inkeep/agents-cli Patch
@inkeep/agents-sdk Patch
@inkeep/agents-work-apps Patch
@inkeep/ai-sdk-provider Patch
@inkeep/create-agents Patch
@inkeep/agents-email Patch
@inkeep/agents-mcp Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@vercel vercel bot temporarily deployed to Preview – agents-docs March 19, 2026 15:10 Inactive
@pullfrog
Copy link
Contributor

pullfrog bot commented Mar 19, 2026

TL;DR — Migrates all SigNoz integration from the v4 API (/api/v4/query_range with builderQueries object maps) to the v5 API (/api/v5/query_range with expression-based filters and queries arrays). This is a net-negative ~1,600 lines change that replaces verbose filter-object trees with concise string expressions, merges related queries to reduce round-trips, and adds parallel batch execution for conversation detail loads.

Key changes

  • signoz-queries.ts — Replaces v4 constants (DATA_TYPES, FIELD_TYPES, QUERY_FIELD_CONFIGS, PANEL_TYPES, DATA_SOURCES, AGGREGATE_OPERATORS) with v5 equivalents (REQUEST_TYPES, QUERY_TYPES, FIELD_CONTEXTS, FIELD_DATA_TYPES, SIGNALS) and adds a buildFilterExpression() helper for converting operator/value pairs to v5 filter strings.
  • signoz.ts (API proxy) — Extracts shared helpers (getSignozConfig, authorizeProject, handleSignozError, extractResults), switches the endpoint to /api/v5/query_range, and rewrites query-batch to parse v5 columnar responses and narrow step-2 time ranges from step-1 timestamps.
  • signozHelpers.ts — Rewrites enforceSecurityFilters from v4 filter-object injection to v5 expression string concatenation, reducing the function from 70 lines to 24.
  • conversations/[conversationId]/route.ts — Merges 15+ individual builderQueries into 3 parallel batches (core, context, events), consolidates separate query names (e.g. contextResolution/contextHandlecontextResolutionAndHandle, aiGenerations/aiStreamingTextaiLlmCalls, three toolApproval*toolApprovals), adds memoized ancestor lookups, and includes per-batch timing logs.
  • traces/route.ts — Updates the validation schema from {queryType, panelType, builderQueries} to {queries[], requestType} to match the v5 payload format.
  • spans/[spanId]/route.ts — Converts the ClickHouse SQL query envelope and response parsing from v4 series[0].labels to v5 columnar columns + data format.
  • conversation-list-item.tsx — Adds a startTime > 0 guard to prevent rendering invalid dates from zero-valued timestamps.

Summary | 9 files | 5 commits | base: mainv5


signoz-queries.ts — v4-to-v5 constant vocabulary

Before: v4 vocabulary (DATA_TYPES, FIELD_TYPES, QUERY_FIELD_CONFIGS, PANEL_TYPES, DATA_SOURCES, AGGREGATE_OPERATORS) with isColumn/isJSON metadata objects used to describe every field.
After: v5 vocabulary (REQUEST_TYPES, QUERY_TYPES, FIELD_CONTEXTS, FIELD_DATA_TYPES, SIGNALS) plus a buildFilterExpression() utility that converts filter items to SQL-like expression strings.

The v4 API required structured objects with key, dataType, type, isColumn, and isJSON for every filter item. v5 uses simple string expressions (key = 'value'), eliminating 60+ lines of config objects. Several query expression names are consolidated:

v4 v5
contextResolution + contextHandle contextResolutionAndHandle
aiGenerations + aiStreamingText aiLlmCalls
toolApprovalRequested/Approved/Denied toolApprovals

signoz-queries.ts


signoz.ts — Proxy route refactor and v5 endpoint

Before: Inline auth checks, config reads, and error handling duplicated across /query, /query-batch, and /health handlers; endpoint pointed at /api/v4/query_range.
After: Shared getSignozConfig(), authorizeProject(), handleSignozError(), and extractResults() helpers; endpoint is /api/v5/query_range; query-batch parses columnar v5 responses and narrows step-2 time windows using min/max timestamps from step-1 results.

The injectConversationIdFilter function (which built structured filter objects) is replaced by inline expression string concatenation. The query-batch step-2 optimization reads columns + data arrays instead of series[].labels to extract conversation IDs, and computes a tighter time range from aggregation column values.

How does the step-2 time range narrowing work? After step 1 returns paginated conversation IDs with timestamps, the code scans the aggregation column to find the min and max timestamps, then sets `detailPayloadTemplate.start = minTs - 1h` and `end = maxTs + 1h`. This avoids querying the full original time range for the detail step.

signoz.ts · signozHelpers.ts


conversations/[conversationId]/route.ts — Parallel batched queries and v5 payloads

Before: A single monolithic SigNoz payload with 15+ builderQueries entries, each using verbose filter objects; sequential execution; no timing instrumentation.
After: 3 parallel payloads (core, context, events) using buildQueryEnvelope() with expression-based filters; parseListByName() to split merged query results by name field; memoized findAncestorActivity / findAncestorAgentGeneration with depth guards; per-batch and total timing logs.

This is the largest change in the PR. The buildConversationListPayload function (returning a single payload) is replaced by buildConversationPayloads (returning 3 payloads executed via Promise.all). Field selection uses a compact sf(name, type, context) helper instead of 5-line objects. Post-query filtering via parseListByName handles the merged queries (e.g., splitting contextResolutionAndHandle rows by span name). The ancestor-lookup functions now use Map-based caches and depth limits to prevent stack overflows on deep span trees.

[conversationId]/route.ts · traces/route.ts


spans/[spanId]/route.ts — v5 ClickHouse SQL response format

Before: Response parsed as json.data.result[0].series[0].labels (v4 series-based format); payload used step, chQueries map, queryType: 'clickhouse_sql', panelType: 'table'.
After: Response parsed as json.data.data.results[0].columns + data (v5 columnar format); payload uses requestType: 'scalar', queries array with {type, spec} envelopes, typed variables with {type: 'custom', value}.

spans/[spanId]/route.ts

Pullfrog  | View workflow run | Triggered by Pullfrogpullfrog.com𝕏

@pullfrog
Copy link
Contributor

pullfrog bot commented Mar 19, 2026

TL;DR — Migrates the entire SigNoz observability integration from v4 to v5 API. The v5 format replaces structured JSON filter objects with string-based filter expressions, swaps builderQueries maps for queries[] arrays, and returns columnar response data instead of series/labels. Also splits the monolithic conversation-detail query into 3 parallel batches, adds resilient retry logic for missing SigNoz attributes, and includes performance optimizations (memoized ancestor lookups, time-range narrowing).

Key changes

  • /api/v5/query_range — All SigNoz proxy and client-side query endpoints target the v5 API instead of v4.
  • String-based filter expressions — Filters switch from structured { key, dataType, type, op, value } JSON objects to SQL-like string expressions (e.g., serviceName IN ('...') AND tenant.id = '...').
  • compositeQuery.queries[] array — Replaces the named builderQueries map with [{ type, spec }] envelopes carrying new fields (signal, selectFields, fieldContext).
  • Columnar response parsing — Response handling switches from data.result[].list[].data to data.results[].rows[] / columns[] + data[][].
  • Parallel query batching — The conversation-detail endpoint splits queries into 3 payloads (core, context, events) fired via Promise.all.
  • Query merging — Similar span types (contextResolution/contextHandle, tool approvals, AI generations/streaming) are merged into single queries with client-side parseListByName() splitting by span name.
  • queryWithRetry() for missing attributes — New retry logic detects SigNoz 400 errors caused by key not found attributes, strips those queries, and retries — gracefully handles attributes not yet ingested.
  • enforceSecurityFilters() rewritten — Appends AND serviceName IN (...) AND tenant.id = '...' strings instead of mutating structured filter item arrays. Now also rejects non-builder query types to prevent tenant isolation bypass via raw SQL.
  • /span-lookup endpoint — New server-side route replaces client-side ClickHouse SQL construction for span detail — the query is built with buildSpanLookupPayload() which bakes tenant filtering into the SQL WHERE clause.
  • Performance optimizationsMap-based memoization caches with depth limits for findAncestorActivity()/findAncestorAgentGeneration(), findLast() replaces [...arr].reverse().find(), multiple .filter().length calls consolidated into a single loop.
  • totalErrors bug fix — Was hardcoded to 0; now reflects the actual count of error-status activities.
  • Shared constants overhauled — Removes v4 constructs (QUERY_FIELD_CONFIGS, DATA_TYPES, FIELD_TYPES, PANEL_TYPES, etc.) and adds v5 equivalents (REQUEST_TYPES, QUERY_TYPES, FIELD_CONTEXTS, SIGNALS, buildFilterExpression()). OPERATORS expanded with full coverage (CONTAINS, NOT_CONTAINS, REGEX, NOT_REGEX, NOT_IN, LESS_THAN, GREATER_THAN).
  • Proxy code refactoredgetSignozConfig(), handleSignozError(), authorizeProject(), extractResults() extracted as shared helpers to eliminate duplication across /query, /query-batch, /span-lookup, and /health.

Summary | 9 files | 10 commits | base: mainv5


SigNoz v4 → v5 query format migration

Before: Queries used compositeQuery.builderQueries — a named map of structured objects with filters.items[] arrays, selectColumns[] with { key, dataType, type, isColumn }, and panelType/queryType at the top level.
After: Queries use compositeQuery.queries[] — an array of { type, spec } envelopes where spec carries signal, filter.expression (string-based), selectFields[] with { name, fieldDataType, fieldContext }, and top-level requestType.

The v5 format is more concise — the sf(name, dataType, fieldContext) helper replaces verbose QUERY_FIELD_CONFIGS presets. Security enforcement now appends AND clauses to expression strings rather than deep-cloning payloads and splicing structured filter items. Non-builder query types are explicitly rejected to prevent tenant isolation bypass via raw SQL.

How does the new buildFilterExpression() work?

A general-purpose utility exported from agents-core that converts { key, op, value } items to string expressions. Handles IN, NOT IN, EXISTS, NOT EXISTS, LIKE, CONTAINS, REGEX, and comparison operators with proper quoting. The OP_MAP translates shorthand operators (e.g., ninNOT IN, nregexNOT REGEX) into their SQL-like v5 equivalents.

signoz.ts · signozHelpers.ts · signoz-queries.ts


queryWithRetry() — resilient handling of missing SigNoz attributes

Before: Any SigNoz 400 error was returned to the client as a generic "Invalid query parameters" response.
After: queryWithRetry() detects key not found errors, identifies which queries reference the missing keys, strips them, and retries. Queries referencing only known attributes succeed even when new attributes haven't been ingested yet.

This prevents the entire query from failing when a newly-instrumented span attribute hasn't been indexed by SigNoz. getMissingKeys() parses the error response, and queryReferencesKeys() scans filter expressions and select fields to determine which queries to remove.

signoz.ts


Parallel query batching and query merging for conversation detail

Before: A single payload with ~16 builderQueries was sent to SigNoz for conversation detail; similar span types each had their own query.
After: Queries are split into 3 logical batches (core, context, events) sent via Promise.all. Similar span types — contextResolution/contextHandle, all tool approvals, aiGenerations/aiStreamingText — are merged into single queries with client-side parseListByName() splitting by SPAN_KEYS.NAME.

This reduces the total query count and enables server-side parallelism. The handler also narrows the step-2 time range to min(timestamp) - 1h .. max(timestamp) + 1h from step-1 results, and now logs timing breakdowns (signozMs, processingMs, totalMs).

[conversationId]/route.ts · signoz-queries.ts


/span-lookup endpoint and columnar response parsing

Before: The span-detail client route constructed a raw ClickHouse SQL payload directly and sent it through the generic /query proxy. Responses were parsed from result[0].series[0].labels.
After: A dedicated /span-lookup server-side route builds the SQL payload via buildSpanLookupPayload() with tenant filtering baked into the WHERE clause. Responses are parsed from columnar columns[] + data[][] format using index-based lookups via data.results[].

Moving SQL construction server-side ensures the client never sends arbitrary ClickHouse SQL through the proxy (which now rejects non-builder queries). The extractResults() helper centralizes result extraction from the v5 response envelope.

spans/[spanId]/route.ts · signozHelpers.ts · signoz.ts


Proxy code refactoring and performance optimizations

Before: Each proxy route (/query, /query-batch, /health) duplicated auth, config lookup, and error handling logic. Ancestor lookups used linear scans with unbounded recursion. totalErrors was hardcoded to 0.
After: Shared helpers getSignozConfig(), authorizeProject(), handleSignozError(), and extractResults() eliminate duplication. Ancestor functions use Map-based lookup with memoization caches and a depth limit of 200. totalErrors reflects finalErrorCount. Multiple .filter().length calls replaced with a single loop.

Memoization caches use ?? null instead of the non-null assertion operator for type-safe Map.get() returns.

signoz.ts · [conversationId]/route.ts · conversation-list-item.tsx

Pullfrog  | View workflow run | Triggered by Pullfrogpullfrog.com𝕏

Copy link
Contributor

@pullfrog pullfrog bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid migration from SigNoz v4 to v5 — good refactoring (consolidated queries, parallel batching, memoized ancestor lookups). Two high-severity items need attention: an expression injection vector in signoz.ts:174 and a null-safety gap in enforceSecurityFilters. Lower-severity items noted inline.

Pullfrog  | Fix all ➔Fix 👍s ➔View workflow runpullfrog.com𝕏

const securityExpr = buildSecurityExpression(tenantId, projectId);
for (const { type, spec } of payload.compositeQuery.queries) {
if (type !== QUERY_TYPES.BUILDER_QUERY) continue;
spec.filter = { expression: `(${spec.filter.expression}) AND ${securityExpr}` };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High: Null-safety gap. If any query arrives without spec.filter or spec.filter.expression (e.g. a new query type or a client-side bug), this line will throw Cannot read properties of undefined (reading 'expression'). The old v4 code was resilient to missing filters — it would create them from scratch.

Suggestion: default to an empty expression:

const existing = spec.filter?.expression ?? 'true';
spec.filter = { expression: `(${existing}) AND ${securityExpr}` };


const convIdExpr = `${SPAN_KEYS.CONVERSATION_ID} IN (${conversationIds.map((id) => `'${id}'`).join(', ')})`;
for (const { spec } of detailPayloadTemplate.compositeQuery.queries) {
spec.filter = { expression: `(${spec.filter.expression}) AND ${convIdExpr}` };

This comment was marked as outdated.

logger.debug({ responseData }, 'SigNoz response (truncated)');
return json;
const json = response.data;
const results = json?.data?.data?.results ?? json?.data?.results ?? [];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same concern as extractResults. The dual fallback json?.data?.data?.results ?? json?.data?.results suggests uncertainty about the response shape. If the v5 shape is known, pin to one path and drop the fallback — the silent fallback makes debugging harder when the response shape changes.

const float64 = FIELD_DATA_TYPES.FLOAT64;
const bool = FIELD_DATA_TYPES.BOOL;

function buildBaseExpression(conversationId: string, projectId?: string): string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same expression injection pattern as the server-side route. conversationId comes from URL path params (line 461) and projectId from query params (line 469). Both are user-controlled and interpolated into filter expressions without escaping. A value like test' OR 1=1 -- would inject arbitrary filter logic.

This is lower risk since the payload goes through enforceSecurityFilters on the server side (which scopes to the correct tenant), but it still allows a user to broaden their own query scope within their tenant — e.g. fetch data for a different conversation.

const payloads = buildConversationPayloads(conversationId, start, end, projectId);
const batchLabels = ['core', 'context', 'events'] as const;

const batchResults = await Promise.all(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good improvement. Splitting the single massive query into three parallel batches (core, context, events) with Promise.all is a meaningful performance win — the SigNoz backend can process them concurrently. The timing logs are also helpful for diagnosing production latency.

Copy link
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(8) Total Issues | Risk: High

🔴❗ Critical (2) ❗🔴

Inline Comments:

  • 🔴 Critical: signozHelpers.ts:6 SQL injection via unescaped tenantId/projectId interpolation
  • 🔴 Critical: signoz.ts:174 SQL injection via unescaped conversation IDs from query results

🟠⚠️ Major (3) 🟠⚠️

🟠 1) signozHelpers.ts:19 Security filters not applied to non-builder query types

Issue: The enforceSecurityFilters function only applies tenant/project filters to queries where type === QUERY_TYPES.BUILDER_QUERY. If a ClickHouse SQL query or other query type is passed through, it will bypass tenant isolation entirely.

Why: The spans/[spanId]/route.ts uses clickhouse_sql queries which would not be filtered. This creates a potential tenant isolation bypass if the proxy endpoint accepts non-builder queries.

Fix: Either:

  1. Reject non-builder query types in the proxy endpoint with a 400 error
  2. Implement security filtering for clickhouse_sql queries by injecting WHERE clauses
  3. Document that only builder queries are supported and validate the query type before proxying

Refs:

Inline Comments:

  • 🟠 Major: signozHelpers.ts:20 Null pointer when spec.filter is undefined
  • 🟠 Major: signoz-queries.ts:52 Incorrect SQL escaping (backslash vs standard SQL '')

🟡 Minor (3) 🟡

🟡 1) signoz.ts:57-80 Missing 429 rate limit handling in SigNoz proxy

Issue: The handleSignozError function does not check for HTTP 429 responses from SigNoz. When SigNoz is rate-limiting requests, the proxy returns a generic 500 error instead of propagating the 429 status and Retry-After header.

Why: This masks the actual issue from clients and prevents them from implementing proper backoff. The conversation route already handles 429 (lines 116-117), but the agents-api proxy does not.

Fix: Add 429 handling in handleSignozError:

if (error.response?.status === 429) {
  logger.warn({ status: 429 }, 'SigNoz rate limit exceeded');
  return { 
    body: { error: 'Too Many Requests', message: 'SigNoz rate limit exceeded', retryAfter: error.response.headers['retry-after'] }, 
    status: 429 as const 
  };
}

Refs:

🟡 2) signoz-queries.ts:76-86 OPERATORS constant missing NOT_IN

Issue: The internal OP_MAP at line 48 includes 'nin' -> 'NOT IN' mapping, but the exported OPERATORS constant does not include NOT_IN. The old v4 OPERATORS had NOT_IN: 'nin'.

Why: Consumer code that previously used OPERATORS.NOT_IN will break after this migration.

Fix: Add NOT_IN: 'nin' to the OPERATORS constant for parity with OP_MAP.

Refs:

Inline Comments:

  • 🟡 Minor: conversations/[conversationId]/route.ts:164 Hardcoded 'builder_query', 'desc', 60, false instead of constants

💭 Consider (1) 💭

💭 1) signoz-stats.ts (multiple methods) Silent fallback to empty data on API errors

Issue: Multiple methods in SigNozStatsAPI catch all errors and return empty arrays/default values without surfacing the error condition to callers. Users see empty dashboards without knowing whether it's due to no data or a service failure.

Why: This is a pre-existing pattern, but worth addressing for better user experience. The pattern appears in 15+ methods throughout the file.

Fix: Consider returning a discriminated union: { status: 'success' | 'error', data?: T, error?: Error } to distinguish between "no data" and "service error".

Refs:

🧹 While You're Here (1) 🧹

🧹 1) signoz-links.ts Still uses v4 SigNoz query format

Issue: This PR migrates SigNoz queries from v4 to v5 format, but signoz-links.ts was not touched and still uses the old v4 compositeQuery structure with queryType/builder/dataSource pattern.

Why: This creates split-world inconsistency where future maintainers may not know which pattern to follow.

Fix: Either migrate signoz-links.ts to v5 format in this PR, or create a tracking issue documenting this as intentional legacy code with a migration plan.

Refs:


🚫 REQUEST CHANGES

Summary: The v4→v5 SigNoz migration is well-structured with significant code simplification (-1600 lines net). However, there are critical SQL injection vulnerabilities in the new string-interpolation-based filter building that must be addressed before merge. The tenantId, projectId, and conversationIds values are interpolated directly into SQL filter expressions without escaping single quotes, creating a potential security and data integrity risk. Additionally, the security filter enforcement skips non-builder query types, which could allow tenant isolation bypass.

Discarded (5)
Location Issue Reason Discarded
signoz-stats.ts:160 Global axios retry configuration may interfere with other HTTP clients Pre-existing pattern, out of scope for this PR
conversations/[conversationId]/route.ts:116 Rate limit handling does not respect Retry-After header Already handles 429 correctly, forwarding the header is a nice-to-have not a bug
spans/[spanId]/route.ts:127 Generic error message loses diagnostic context Pre-existing pattern, not introduced by this PR
signoz-stats.ts:127 Filter value parsing error uses console.warn but continues Defensive pattern, appropriate behavior
conversations/[conversationId]/route.ts:807 Empty catch blocks for JSON parsing Appropriate defensive handling for optional data
Reviewers (4)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
pr-review-standards 4 1 0 0 3 0 0
pr-review-sre 7 1 0 0 1 0 3
pr-review-errors 4 0 1 0 0 0 2
pr-review-consistency 5 1 0 1 1 0 0
Total 20 3 1 1 5 0 5

};

function quoteValue(value: unknown): string {
if (typeof value === 'string') return `'${value.replace(/'/g, "\\'")}'`;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 MAJOR Incorrect SQL escaping: backslash instead of standard SQL

Issue: The quoteValue function escapes single quotes by replacing ' with \' (backslash-escape). Standard SQL uses '' (doubled single quotes). ClickHouse supports both syntaxes, but backslash escaping can behave unexpectedly depending on settings.

Why: Using non-standard escaping may cause issues in edge cases and is inconsistent with the escaping recommended for ClickHouse.

Fix:

Suggested change
if (typeof value === 'string') return `'${value.replace(/'/g, "\\'")}'`;
if (typeof value === 'string') return `'${value.replace(/'/g, "''")}'`;

Refs:

limit = 10000
): any {
return {
type: 'builder_query',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Minor Hardcoded string instead of QUERY_TYPES.BUILDER_QUERY constant

Issue: This file uses the hardcoded string 'builder_query' while signoz-stats.ts consistently uses QUERY_TYPES.BUILDER_QUERY. The same applies to 'desc' (line 170), 60 (line 172), and false (line 173) which have corresponding constants.

Why: Using magic strings instead of shared constants creates maintenance burden and risks typos. The peer file signoz-stats.ts consistently uses QUERY_TYPES.BUILDER_QUERY, ORDER_DIRECTIONS.DESC, QUERY_DEFAULTS.STEP_INTERVAL, and QUERY_DEFAULTS.DISABLED.

Fix: Import and use the constants:

import { QUERY_TYPES, ORDER_DIRECTIONS, QUERY_DEFAULTS } from '@/constants/signoz';
// ...
return {
  type: QUERY_TYPES.BUILDER_QUERY,
  spec: {
    // ...
    order: [{ key: { name: SPAN_KEYS.TIMESTAMP }, direction: ORDER_DIRECTIONS.DESC }],
    stepInterval: QUERY_DEFAULTS.STEP_INTERVAL,
    disabled: QUERY_DEFAULTS.DISABLED,
  },
};

Refs:

@github-actions github-actions bot deleted a comment from claude bot Mar 19, 2026
@itoqa
Copy link

itoqa bot commented Mar 19, 2026

Ito Test Report ✅

6 test cases ran. 6 passed.

✅ The included trace workflow checks passed: filter mode switching behaved as expected, timestamp badge suppression was validated, date clamping worked, invalid conversation handling remained controlled, and rapid interaction flows converged to stable UI states.

✅ Passed (6)
Test Case Summary Timestamp Screenshot
ROUTE-2 SigNoz configuration warning banner was shown clearly and the traces page remained usable, matching unconfigured-health behavior. 0:00 ROUTE-2_0-00.png
LOGIC-2 Confirmed request URL switched from mode=batch to non-batch after entering search text; API corroboration returned controlled 400 response without crash. 11:03 LOGIC-2_11-03.png
LOGIC-3 Using mocked batch API responses due local traces backend 503s, the overview rendered a relative-time badge for a valid startTime row and correctly suppressed the badge for the row with startTime=0. 18:30 LOGIC-3_18-30.png
EDGE-1 Verified custom calendar prevented future end-date selection and applying the range did not trigger hard validation failure. 11:03 EDGE-1_11-03.png
EDGE-4 Invalid conversation route displayed a controlled error state and Back to Overview recovery returned to traces overview successfully. 2:35 EDGE-4_2-35.png
ADV-3 After aggressive local API mocking fallback, rapid pagination plus repeated Errors-only and agent-filter toggles converged to a stable final overview state without duplicate rows or persistent spinner. 21:23 ADV-3_21-23.png
📋 View Recording

Screen Recording

@inkeep inkeep deleted a comment from claude bot Mar 19, 2026
@inkeep inkeep deleted a comment from claude bot Mar 19, 2026
@inkeep inkeep deleted a comment from claude bot Mar 19, 2026
Copy link
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(4) Total Issues | Risk: Medium

Re-review scope: This review covers the delta since the last automated review (3 commits, 4 files). Prior findings are referenced in Pending Recommendations.

✅ Issues Addressed in This Delta

Two critical issues from the prior review have been fixed:

  1. SQL escaping fixedquoteValue() now uses standard SQL '' escaping instead of backslash \'
  2. Non-builder query bypass fixedenforceSecurityFilters() now throws on non-builder query types, preventing tenant isolation bypass via raw SQL

🟠⚠️ Major (1) 🟠⚠️

Inline Comments:

  • 🟠 Major: signoz.ts:72-80 queryReferencesKeys doesn't check orderBy fields

🟡 Minor (2) 🟡

Inline Comments:

  • 🟡 Minor: signoz.ts:108 Silent empty response when all queries stripped
  • 🟡 Minor: signoz.ts:291 Magic number for 180-day lookback period

🕐 Pending Recommendations (3)

These issues were raised in the prior review and remain unaddressed:

  • 🔴 signozHelpers.ts:8 SQL injection via unescaped tenantId/projectId interpolation in buildSecurityExpression
  • 🔴 signoz.ts:262 SQL injection via unescaped conversation IDs from query results
  • 🟠 signozHelpers.ts:28-30 Null pointer exception when spec.filter is undefined

🚫 REQUEST CHANGES

Summary: Good progress on this delta — the SQL escaping and non-builder query bypass issues are now fixed. However, the critical SQL injection vulnerabilities flagged in the prior review (unescaped tenantId/projectId and conversationIds interpolation) remain unaddressed. These should be fixed before merging to ensure tenant data isolation.

Quick wins:

  1. For buildSecurityExpression: escape tenantId and projectId with the same quoteValue() helper that was just fixed, or use parameterized queries
  2. For conversation IDs at line 262: conversationIds.map((id) => quoteValue(id)).join(', ') (import quoteValue from signoz-queries.ts)
  3. For spec.filter null check: if (!spec?.filter?.expression) throw new Error('Query must have filter.expression');
Discarded (4)
Location Issue Reason Discarded
signoz.ts:85-117 Retry logic lacks backoff delay Since the retry modifies the payload (strips queries), immediate retry is reasonable — it's not retrying the same request
signoz.ts:275-310 Span-lookup doesn't use queryWithRetry ClickHouse SQL queries have different semantics; the retry logic is designed for builder queries with multiple sub-queries
signoz.ts:103-106 retried flag not used at call sites The logging at line 103-106 already provides the key information; adding more logging is marginal value
signoz.ts:111-115 Retry failure logging missing The outer handleSignozError provides adequate error context
Reviewers (4)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
pr-review-security-iam 0 0 0 0 0 0 0
pr-review-sre 6 0 0 0 2 0 2
pr-review-errors 3 0 0 0 1 0 2
pr-review-standards 3 0 0 0 1 0 0
Total 12 0 0 0 3 0 4

Note: Security reviewer returned 0 findings for delta — prior critical issues are pre-existing and tracked in Pending Recommendations.

Comment on lines +72 to +80
function queryReferencesKeys(query: any, keys: string[]): boolean {
const spec = query?.spec;
const searchable = [
spec?.filter?.expression ?? '',
...(spec?.selectFields ?? []).map((f: any) => f.name),
...(spec?.groupBy ?? []).map((g: any) => g.name),
];
return keys.some((k) => searchable.some((t: string) => t.includes(k)));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 MAJOR queryReferencesKeys doesn't check orderBy fields

Issue: The function only checks filter.expression, selectFields, and groupBy for missing key references. If a missing attribute is referenced only in orderBy, the query won't be filtered out and the retry will fail with the same error.

Why: SigNoz builder queries can reference attributes in order clauses. If the missing key error originates from an orderBy reference, the retry logic will fail to identify and remove the problematic query, causing an unnecessary second failure.

Fix: Add orderBy to the searchable fields:

const searchable = [
  spec?.filter?.expression ?? '',
  ...(spec?.selectFields ?? []).map((f: any) => f.name),
  ...(spec?.groupBy ?? []).map((g: any) => g.name),
  ...(spec?.order ?? []).map((o: any) => o.key?.name ?? o.columnName ?? ''),
];

Refs:

'Retrying SigNoz query without queries referencing missing keys'
);

if (kept.length === 0) return { data: EMPTY_RESPONSE, retried: true };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Minor Silent empty response when all queries stripped

Issue: When all queries reference missing keys, the function returns EMPTY_RESPONSE silently. The retried: true flag is returned but never surfaced to users. Clients receive empty results with no indication their query attributes aren't available.

Why: Users may assume "no data exists" rather than "query attributes not ingested yet." This makes debugging observability issues difficult during SigNoz schema migrations.

Fix: Log at a higher visibility level when returning empty due to stripped queries:

if (kept.length === 0) {
  logger.warn(
    { missingKeys: missing, originalQueryCount: queries.length },
    'All queries stripped due to missing attributes - returning empty response'
  );
  return { data: EMPTY_RESPONSE, retried: true };
}

Refs:

  • Line 103-106 already logs successful partial strips; this extends to full strip case

return c.json({ error: 'Service Unavailable', message: 'SigNoz is not configured' }, 500);

const now = Date.now();
const lookbackMs = 180 * 24 * 60 * 60 * 1000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Minor Magic number for lookback period

Issue: The 180-day lookback period is hardcoded without explanation. This wide time range may affect query performance on large ClickHouse tables.

Why: Without documentation, operators can't tune this value during performance issues. The 180-day window seems generous for a span lookup.

Fix: Extract to a named constant:

Suggested change
const lookbackMs = 180 * 24 * 60 * 60 * 1000;
const SPAN_LOOKUP_LOOKBACK_MS = 180 * 24 * 60 * 60 * 1000; // 180 days: covers full trace retention period

@github-actions github-actions bot deleted a comment from claude bot Mar 19, 2026
Copy link
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(0) New Issues | (3) Pending from Prior Review | Risk: High

Re-review scope: This review covers the delta since the last automated review (1 commit, 3 files, +11/-3 lines).

✅ Delta Assessment

The latest commit 74b0622d6 ("comments and fixes") contains no new issues. All changes are improvements:

  1. signoz-queries.ts — Added missing operators to OPERATORS constant (NOT_IN, REGEX, NOT_REGEX, LESS_THAN, GREATER_THAN, CONTAINS, NOT_CONTAINS). This restores parity with the internal OP_MAP and fixes a breaking change flagged in the prior review.

  2. conversations/[conversationId]/route.ts:1285 — Fixed null-safety in cache lookup:

    - if (agentGenCache.has(activityId)) return agentGenCache.get(activityId)!;
    + if (agentGenCache.has(activityId)) return agentGenCache.get(activityId) ?? null;

    The non-null assertion (!) was unsafe since the cache stores null values. Now correctly returns null when the cached value is null.

  3. spans/[spanId]/route.ts:51 — Simplified response path:

    - const results = json?.data?.data?.results ?? json?.data?.results;
    + const results = json?.data?.results ?? [];

    Removes uncertain dual-path fallback in favor of the correct v5 response shape.

🕐 Pending Recommendations (3)

These critical and major issues were raised in the prior review and remain unaddressed:

  • 🔴 signozHelpers.ts:8 — SQL injection via unescaped tenantId/projectId interpolation in buildSecurityExpression. Values are interpolated directly into filter expressions without escaping single quotes.

  • 🔴 signoz.ts:262 — SQL injection via unescaped conversation IDs. The conversationIds array from step-1 results is interpolated directly: conversationIds.map((id) => \'${id}'`).join(', ')` without escaping.

  • 🟠 signozHelpers.ts:29 — Null pointer exception when spec.filter is undefined. Line 29 accesses spec.filter.expression without null check.

Quick fixes for the SQL injection issues:

// signozHelpers.ts:8 — use quoteValue from signoz-queries.ts
import { quoteValue } from '@inkeep/agents-core';
let expr = `${SERVICE_NAME_FILTER} AND ${SPAN_KEYS.TENANT_ID} = ${quoteValue(tenantId)}`;
if (projectId) expr += ` AND ${SPAN_KEYS.PROJECT_ID} = ${quoteValue(projectId)}`;

// signoz.ts:262 — escape conversation IDs
const convIdExpr = `${SPAN_KEYS.CONVERSATION_ID} IN (${conversationIds.map((id) => quoteValue(id)).join(', ')})`;

// signozHelpers.ts:29 — add null check
const existing = spec.filter?.expression ?? 'true';
spec.filter = { expression: `(${existing}) AND ${securityExpr}` };

🚫 REQUEST CHANGES

Summary: The delta itself is clean — all 3 changes are improvements with no new issues introduced. However, the critical SQL injection vulnerabilities flagged in the prior review remain unaddressed in the unchanged files. These must be fixed before merge to ensure tenant data isolation. The fix is straightforward: import and use the quoteValue() helper (which now correctly uses standard SQL '' escaping) for all user-controlled string interpolation.

Discarded (0)

No findings discarded in this delta review.

Reviewers (1)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
orchestrator 3 0 0 0 0 3 0
Total 3 0 0 0 0 3 0

Note: Delta-scoped review — sub-reviewers not dispatched for minimal changes. Prior critical issues tracked in Pending Recommendations.

@itoqa
Copy link

itoqa bot commented Mar 20, 2026

Ito Test Report ❌

20 test cases ran. 1 failed, 19 passed.

Across 8 report shards covering 20 test cases, 19 passed and the traces platform was largely stable: core overview/detail routes, pagination/search fallback, span lookup side pane, analytics pages (AI/Tool Calls), auth and tenant/project scoping protections, rapid-interaction stress behavior, mobile usability, and multiple API validation guardrails all behaved as expected without crashes or data leakage. The single significant finding was a Medium-severity defect in GET /api/traces/conversations/[conversationId], where a whitespace-only conversationId (%20) is not trimmed/rejected and is incorrectly forwarded into downstream time-range/query logic (likely introduced by the PR), weakening endpoint contract reliability and adding avoidable operational noise.

❌ Failed (1)
Category Summary Screenshot
Edge 🟠 Conversation details route accepts a whitespace-only conversationId path value and continues processing instead of rejecting it. EDGE-3
🟠 Conversation details endpoint accepts whitespace-only conversation IDs
  • What failed: The route only rejects falsy conversationId values, so a whitespace path segment (%20) is treated as valid and forwarded into time-range lookup and trace query construction instead of returning a parameter-validation error.
  • Impact: Invalid path input can trigger unnecessary downstream requests and inconsistent error behavior instead of a clean client-facing validation response. This weakens endpoint contract reliability for callers and increases operational noise.
  • Introduced by this PR: Yes – this PR modified the relevant code
  • Steps to reproduce:
    1. Call GET /api/traces/conversations/%20?tenantId=default&projectId= with valid authentication.
    2. Observe that the request is processed through time-range lookup and downstream query logic instead of being rejected immediately.
    3. Confirm the route does not return a direct invalid-parameter response for the whitespace-only path ID.
  • Code analysis: I reviewed the conversation details route and its time-range helper. The handler checks only if (!conversationId), then immediately uses the raw value in both bounds lookup and query filter expression construction; no trim/sanitization step exists before downstream execution.
  • Why this is likely a bug: The production route accepts syntactically invalid whitespace-only IDs due to missing normalization/validation, and code paths clearly show the invalid value is consumed as if it were a real identifier.

Relevant code:

agents-manage-ui/src/app/api/traces/conversations/[conversationId]/route.ts (lines 461-464)

const { conversationId } = await context.params;
if (!conversationId) {
  return NextResponse.json({ error: 'Conversation ID is required' }, { status: 400 });
}

agents-manage-ui/src/app/api/traces/conversations/[conversationId]/route.ts (lines 151-155)

function buildBaseExpression(conversationId: string, projectId?: string): string {
  const parts = [`${SPAN_KEYS.CONVERSATION_ID} = '${conversationId}'`];
  if (projectId) parts.push(`${SPAN_KEYS.PROJECT_ID} = '${projectId}'`);
  return parts.join(' AND ');
}

agents-manage-ui/src/lib/api/signoz-conversation-time-range.ts (lines 32-35)

const bounds = await makeManagementApiRequest<{
  data: { createdAt: string; updatedAt: string };
}>(`tenants/${tenantId}/projects/${projectId}/conversations/${conversationId}/bounds`);
✅ Passed (19)
Category Summary Screenshot
Adversarial Generic traces proxy rejected non-builder query type (clickhouse_sql) with explicit HTTP 400 error. ADV-1
Adversarial Injection-like span filter payload did not crash the UI or expose out-of-scope data in the validated scenario. ADV-2
Adversarial Unauthenticated traces access is blocked; invalid POST payload is rejected by schema validation and valid payload is denied with auth error. ADV-3
Adversarial Tenant/project tampering attempts were denied or returned safe scoped errors with no foreign trace payload exposure. ADV-4
Adversarial Rapid toggles, search, and time-range interactions converged to a stable final traces state with no trace-request or trace-console errors after environment remediation. ADV-5
Adversarial Mobile traces overview and conversation detail remained usable at 390x844 under this run's dev fallback setup. ADV-6
Edge Inverted and future time-range requests were rejected with explicit HTTP 400 validation errors. EDGE-1
Edge Batch mode rejected malformed payloads missing required sections with deterministic HTTP 400 responses. EDGE-2
Edge Span details endpoint correctly required conversationId and returned HTTP 400 when omitted. EDGE-4
Edge Conversation row with zero startTime rendered safely and omitted relative-time chip without exceptions. EDGE-5
Edge Deep-link refresh and back-forward navigation returned to a stable Conversation Details state without error. EDGE-6
Happy-path GET /api/traces?tenantId=default returned HTTP 200 with structured status/configured fields and no stack trace exposure. ROUTE-1
Happy-path Traces overview rendered aggregate cards and conversation list container for the project without runtime crash. ROUTE-2
Happy-path Pagination advanced and returned correctly with stable conversation IDs and no duplication in the validated flow. ROUTE-3
Happy-path Search flow used fallback behavior and clearing search restored the expected list state. ROUTE-4
Happy-path Deep-link conversation details view rendered summary cards and timeline for the target conversation route. ROUTE-5
Happy-path Selecting a timeline activity loaded span details and displayed structured side-pane content. ROUTE-6
Happy-path AI Calls page remained functional with deterministic traces responses, including agent filter interaction and rendered cards/charts. ROUTE-7
Happy-path Tool Calls page rendered correctly with deterministic traces responses, including date-range change and stable breakdown cards. ROUTE-8

Commit: 74b0622

View Full Run


Tell us how we did: Give Ito Feedback

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant