[OPIK-5270] [BE] fix: add trace_id_prefilter to scope trace query CTEs by andrescrz · Pull Request #5928 · comet-ml/opik

andrescrz · 2026-03-27T17:13:25Z

Details

Adds a conditional trace_id_prefilter CTE to the SELECT_BY_PROJECT_ID trace query, matching the span_id_prefilter pattern from PRs #5625 and #5599. When trace-level filters (tags, search text) or search text are active, the prefilter narrows all enrichment CTEs (feedback scores, spans, guardrails, comments, annotations, experiments) to only process data for matching traces instead of scanning the entire project.

This prevents OOM kills caused by the arrayMap in feedback_scores_final attempting to allocate multi-GiB chunks when processing unscoped feedback scores for large projects.

Prefilter CTE: SELECT DISTINCT id FROM traces WHERE <filters>
9 CTEs scoped via IN (SELECT id FROM trace_id_prefilter) with if/else fallback to uuid-range
Decision logic in shouldUseTraceIdPrefilter(): activates when narrowing filters exist, guards against feedback score filters and sort-by-feedback-scores
Both findTraceStream (streaming) and getTracesByProjectId (paginated) paths covered

Performance (measured on affected customer instance)

Metric	Before (killed)	After (prefilter)
Rows read	34,995,469	273,092
Bytes read	6.26 GiB	33.21 MiB
Duration	killed / 11s+	4.8s
Memory	21+ GiB (OOM at 13.97 GiB limit)	12.35 GiB

The remaining 12 GiB memory is inherent to the project's data volume (confirmed by a real application query without prefilter hitting 13.01 GiB and being killed with the same arrayMap OOM). The prefilter eliminates the 4-8 GiB arrayMap chunk allocation that pushes memory past the limit.

Without narrowing filters active, the query is unchanged (zero overhead).

Change checklist

User facing
Documentation update

Issues

OPIK-5270

AI-WATERMARK

AI-WATERMARK: yes

If yes:
- Tools: Claude Code
- Model(s): Claude Opus 4.6
- Scope: Implementation, analysis, and query optimization
- Human verification: Tested directly against customer ClickHouse instance with before/after measurements

Testing

mvn compile passes
Manually tested the rendered SQL query directly against the affected ClickHouse instance:
- Ran original killed query and prefiltered variant side by side
- Verified identical result sets (6 rows)
- Measured rows read, bytes read, memory, and duration via system.query_log
- Confirmed the query no longer exceeds the 13.97 GiB memory limit
- Verified the prefilter does not activate when no narrowing filters are present (no regression for default page loads)

Documentation

N/A — internal query optimization, no user-facing API changes.

🤖 Generated with Claude Code

apps/opik-backend/src/main/java/com/comet/opik/domain/TraceDAO.java

github-actions · 2026-03-27T17:18:51Z

Backend Tests - Integration Group 7

1 238 tests 1 238 ✅ 6m 23s ⏱️
13 suites 0 💤
13 files 0 ❌

Results for commit 0937597.

♻️ This comment has been updated with latest results.

github-actions · 2026-03-27T17:20:47Z

Backend Tests - Integration Group 4

1 485 tests 1 485 ✅ 9m 27s ⏱️
8 suites 0 💤
8 files 0 ❌

Results for commit 0937597.

♻️ This comment has been updated with latest results.

thiagohora · 2026-03-27T17:26:12Z

apps/opik-backend/src/main/java/com/comet/opik/domain/TraceDAO.java

+            WITH <if(trace_id_prefilter)>trace_id_prefilter AS (
+                SELECT DISTINCT id
+                FROM traces
+                WHERE workspace_id = :workspace_id
+                AND project_id = :project_id
+                <if(last_received_id)> AND id \\< :last_received_id <endif>
+                <if(uuid_from_time)> AND id >= :uuid_from_time <endif>
+                <if(uuid_to_time)> AND id \\<= :uuid_to_time <endif>
+                <if(filters)> AND <filters> <endif>
+                <if(search_text)> AND <search_text> <endif>


Here, we need either the order-by to dedup or final; otherwise, the search may match the wrong rows.

@thiagohora as we only need the ids here, DISTINCT should be enough.

But without the final sorting/dedup, the filters and search_text could match older versions. no?

Good catch — I analyzed this thoroughly. The prefilter uses SELECT DISTINCT id without dedup (no FINAL, no LIMIT 1 BY). This means old trace versions from non-merged parts could match the filter ("phantom" IDs). Here's why that's safe:

Every phantom trace ID is neutralized downstream:

Final SELECT LEFT JOINs: All enrichment CTEs (feedback_scores, spans, comments, guardrails, experiments) are LEFT JOINed with traces_final, which comes from traces_deduped. traces_deduped applies the same <filters> with proper ORDER BY ... LIMIT 1 BY id dedup. Phantom traces never appear in traces_final, so their enrichment data is discarded by the JOINs.

CTE-dependent filters in traces_deduped: When feedback_scores_filters, feedback_scores_empty_filters, span_feedback_scores_filters, span_feedback_scores_empty_filters, or guardrails_filters are active, shouldUseTraceIdPrefilter disables the prefilter entirely — so these paths never see phantom data.

Unguarded filters (trace_aggregation_filters, annotation_queue_filters): Even if phantom T1's span/annotation data passes these checks in traces_deduped, T1 still fails the <filters> condition (applied with LIMIT 1 BY id) on the traces table itself. Multiple conditions are ANDed — phantom can't survive.

Never under-inclusive: If the latest trace version matches the filter, that row exists in the table and DISTINCT will find it. The prefilter is always a superset, never misses real matches.

Cost of adding dedup: FINAL or ORDER BY + LIMIT 1 BY on every evaluation across 9 CTE references. Since the prefilter is purely a scoping optimization (not the authoritative filter), this cost has no correctness benefit.

🤖 Reply posted via /address-github-pr-comments

The guardrails filter injects gagg.guardrails_result into the <filters> template variable, referencing the guardrails_agg CTE alias. Since trace_id_prefilter only queries FROM traces, this reference fails with UNKNOWN_IDENTIFIER. Guard against guardrails_filters in shouldUseTraceIdPrefilter to disable the prefilter when guardrails filters are active. Renamed the guard variable to hasCteDependentFilters for clarity. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

apps/opik-backend/src/main/java/com/comet/opik/domain/TraceDAO.java

ldaugusto · 2026-03-27T17:27:17Z

apps/opik-backend/src/main/java/com/comet/opik/domain/TraceDAO.java

            var template = newTraceThreadFindTemplate(SELECT_BY_PROJECT_ID, criteria, TRACE_SEARCH_CLAUSE);
            template.add("log_comment", logComment);

+            if (shouldUseTraceIdPrefilter(criteria, template)) {


I'm assuming we don't use && !sortHasFeedbackScores here like we do in the other shouldUseTraceIdPrefilter usage as findTraceStream has no sorting?

Correct — findTraceStream has no sorting, so there's no orderBySql to check. The !sortHasFeedbackScores guard only applies in the paginated path (getTracesByProjectId) where sorting by feedback scores requires the full unscoped feedback_scores CTE for the sort JOIN.

🤖 Reply posted via /address-github-pr-comments

ldaugusto · 2026-03-27T17:30:20Z

apps/opik-backend/src/main/java/com/comet/opik/domain/TraceDAO.java

+            WITH <if(trace_id_prefilter)>trace_id_prefilter AS (
+                SELECT DISTINCT id
+                FROM traces
+                WHERE workspace_id = :workspace_id
+                AND project_id = :project_id
+                <if(last_received_id)> AND id \\< :last_received_id <endif>
+                <if(uuid_from_time)> AND id >= :uuid_from_time <endif>
+                <if(uuid_to_time)> AND id \\<= :uuid_to_time <endif>
+                <if(filters)> AND <filters> <endif>
+                <if(search_text)> AND <search_text> <endif>


@thiagohora as we only need the ids here, DISTINCT should be enough.

ldaugusto

Great improvement!

#5928) * [OPIK-5270] [BE] fix: scope trace stream query CTEs to matching traces * fix(trace-dao): guard prefilter against guardrails_filters The guardrails filter injects gagg.guardrails_result into the <filters> template variable, referencing the guardrails_agg CTE alias. Since trace_id_prefilter only queries FROM traces, this reference fails with UNKNOWN_IDENTIFIER. Guard against guardrails_filters in shouldUseTraceIdPrefilter to disable the prefilter when guardrails filters are active. Renamed the guard variable to hasCteDependentFilters for clarity. ---------

[OPIK-5270] [BE] fix: scope trace stream query CTEs to matching traces

3baee5e

andrescrz requested a review from a team as a code owner March 27, 2026 17:13

github-actions bot added java Pull requests that update Java code Backend labels Mar 27, 2026

github-actions bot assigned andrescrz Mar 27, 2026

baz-reviewer bot reviewed Mar 27, 2026

View reviewed changes

apps/opik-backend/src/main/java/com/comet/opik/domain/TraceDAO.java Outdated Show resolved Hide resolved

baz-reviewer bot approved these changes Mar 27, 2026

View reviewed changes

thiagohora reviewed Mar 27, 2026

View reviewed changes

baz-reviewer bot reviewed Mar 27, 2026

View reviewed changes

apps/opik-backend/src/main/java/com/comet/opik/domain/TraceDAO.java Show resolved Hide resolved

ldaugusto reviewed Mar 27, 2026

View reviewed changes

baz-reviewer bot approved these changes Mar 27, 2026

View reviewed changes

thiagohora approved these changes Mar 27, 2026

View reviewed changes

ldaugusto approved these changes Mar 27, 2026

View reviewed changes

andrescrz merged commit 3b5bee4 into main Mar 27, 2026
76 checks passed

andrescrz deleted the andrescrz/OPIK-5270-fix-trace-stream-query-perf branch March 27, 2026 17:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OPIK-5270] [BE] fix: add trace_id_prefilter to scope trace query CTEs#5928

[OPIK-5270] [BE] fix: add trace_id_prefilter to scope trace query CTEs#5928
andrescrz merged 2 commits intomainfrom
andrescrz/OPIK-5270-fix-trace-stream-query-perf

andrescrz commented Mar 27, 2026 •

edited

Loading

Uh oh!

Uh oh!

github-actions bot commented Mar 27, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 27, 2026 •

edited

Loading

Uh oh!

thiagohora Mar 27, 2026

Uh oh!

ldaugusto Mar 27, 2026

Uh oh!

thiagohora Mar 27, 2026 •

edited

Loading

Uh oh!

andrescrz Mar 27, 2026

Uh oh!

Uh oh!

ldaugusto Mar 27, 2026

Uh oh!

andrescrz Mar 27, 2026

Uh oh!

ldaugusto Mar 27, 2026

Uh oh!

ldaugusto left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

andrescrz commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Details

Performance (measured on affected customer instance)

Change checklist

Issues

AI-WATERMARK

Testing

Documentation

Uh oh!

Uh oh!

github-actions bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Backend Tests - Integration Group 7

Uh oh!

github-actions bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Backend Tests - Integration Group 4

Uh oh!

thiagohora Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

ldaugusto Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

thiagohora Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andrescrz Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ldaugusto Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

andrescrz Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

ldaugusto Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

ldaugusto left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

andrescrz commented Mar 27, 2026 •

edited

Loading

github-actions bot commented Mar 27, 2026 •

edited

Loading

github-actions bot commented Mar 27, 2026 •

edited

Loading

thiagohora Mar 27, 2026 •

edited

Loading