Skip to content

Add ClickHouse query layer for spans with feature flag support#2302

Open
geclos wants to merge 28 commits intomainfrom
claude/clickhouse-read-operations-a59cO
Open

Add ClickHouse query layer for spans with feature flag support#2302
geclos wants to merge 28 commits intomainfrom
claude/clickhouse-read-operations-a59cO

Conversation

@geclos
Copy link
Collaborator

@geclos geclos commented Feb 16, 2026

Summary

This PR introduces a ClickHouse query layer for spans data, allowing the application to read span data from ClickHouse instead of the primary database. The implementation includes a feature flag (clickhouse-spans-read) to control which workspaces use the new query layer.

Key Changes

  • New ClickHouse Query Modules: Added comprehensive query functions for spans in ClickHouse:

    • get.ts - Retrieve a single span by ID and trace ID
    • list.ts - List all spans for a trace
    • findLimited.ts - Paginated span queries with cursor support and default time window handling
    • findByDocumentLogUuid.ts - Query spans by document log UUID with various filtering options
    • findBySpanAndTraceIds.ts - Batch queries for spans and parent-child relationships
    • findByEvaluationResults.ts - Query spans referenced by evaluation results
    • countByProjectAndSource.ts - Count spans by source
    • toSpan.ts - Convert ClickHouse row format to Span domain objects
    • getByDocumentLogUuidAndSpanId.ts - Composite query helper
  • SpansRepository Enhancement: Modified SpansRepository to support dual-mode operation:

    • Added constructor option useClickHouse to enable ClickHouse queries
    • Implemented conditional routing in all query methods to use ClickHouse when enabled
    • Maintains backward compatibility with existing PostgreSQL queries
  • Feature Flag Integration: Created createSpansRepository.ts factory function that:

    • Checks the clickhouse-spans-read feature flag for the workspace
    • Instantiates SpansRepository with appropriate backend selection
    • Gracefully handles feature flag lookup failures

Implementation Details

  • All ClickHouse queries use parameterized queries to prevent SQL injection
  • Cursor-based pagination implemented for efficient large dataset traversal
  • Automatic fallback to all-time range when default time window returns no results
  • Proper type conversion between ClickHouse DateTime64 format and JavaScript Date objects
  • Support for complex filtering (types, sources, experiment UUIDs, date ranges)
  • Maintains API compatibility - ClickHouse and PostgreSQL implementations return identical result types

https://claude.ai/code/session_013nmboM3k7H1FUgYHQpoJQj

@geclos geclos marked this pull request as ready for review February 16, 2026 11:43
@geclos geclos force-pushed the claude/clickhouse-read-operations-a59cO branch from 860578f to d87876d Compare February 16, 2026 11:49
@@ -0,0 +1 @@
export const CLICKHOUSE_SPANS_READ_FLAG = 'clickhouse-spans-read'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely yes

@geclos geclos force-pushed the claude/clickhouse-read-operations-a59cO branch 4 times, most recently from 4ecda72 to 3c3930f Compare February 16, 2026 14:02
AND document_uuid = {documentUuid: UUID}
AND commit_id IN ({commitIds: Array(UInt64)})
AND evaluated_span_id IS NOT NULL
AND id IN ({evaluationResultIds: Array(UInt64)})
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we are passing ids as a filter why do we need all the other filters in this query

@geclos
Copy link
Collaborator Author

geclos commented Feb 16, 2026

@claude fix all comments

@geclos geclos force-pushed the claude/clickhouse-read-operations-a59cO branch from f7e5d02 to 874c8ee Compare February 16, 2026 17:10
claude and others added 26 commits February 16, 2026 18:53
Add ClickHouse query functions for all SpansRepository read methods,
gated behind the existing 'clickhouse-spans-read' feature flag. When
the flag is enabled for a workspace, reads are served from ClickHouse
instead of PostgreSQL.

New files:
- queries/clickhouse/spans/*.ts: ClickHouse query implementations
  for get, list, findLimited, findByDocumentLogUuid, etc.
- queries/clickhouse/spans/toSpan.ts: SpanRow→Span conversion utility
- repositories/createSpansRepository.ts: factory that resolves the
  feature flag and instantiates SpansRepository with useClickHouse

Modified:
- SpansRepository: accepts optional { useClickHouse } option; each
  public method delegates to the corresponding ClickHouse query when
  the flag is set, otherwise falls back to existing PostgreSQL logic.

https://claude.ai/code/session_013nmboM3k7H1FUgYHQpoJQj
Instead of requiring a factory function at every call site,
SpansRepository now auto-detects whether to use ClickHouse by lazily
checking the feature flag on the first method call and caching the
result for the instance lifetime.

- Default behavior (no options): lazily checks the feature flag,
  so toggling it per-workspace immediately routes all reads
- Explicit override: pass { useClickHouse: true/false } to force
  a specific backend (used in processBulk.ts write path)
- Removed createSpansRepository factory since it's no longer needed
- Pinned processBulk.ts to PostgreSQL since it reads within a
  write transaction where data hasn't been committed to ClickHouse

https://claude.ai/code/session_013nmboM3k7H1FUgYHQpoJQj
- Remove unnecessary type filter from fetchConversation query
- Count distinct traces instead of spans in countMainTypesSince
- Rename functions for clarity:
  - getSpan -> findSpan
  - listSpans -> findSpans
  - findBySpanAndTraceIds -> findBySpanAndTraceIdPairs
  - getSpansCountForAnnotationsProgress -> countSpansForAnnotations
- Simplify hasProductionTraces by removing unnecessary filters and using equality check
- Move utility functions (orUndefined, parseClickHouseDate) to lib/typeConversions.ts
- Fix getSpansWithoutIssues to use commit UUIDs instead of IDs in ClickHouse queries
- Remove redundant subquery in PostgreSQL getSpansWithoutIssues
- When filtering by evaluation result IDs, remove unnecessary document_uuid and commit filters

Co-authored-by: Gerard <geclos@users.noreply.github.com>
Metadata JSON payloads are now gzip-compressed before being written to
disk/S3, reducing storage costs. Decompression on read is transparent
to callers and backward-compatible with existing uncompressed data
(detected via gzip magic bytes).

https://claude.ai/code/session_01VC6xgLtYYKVkhXenuyYAGT

Co-authored-by: Claude <noreply@anthropic.com>
Extract duplicated logic from getByDocument and getSpansWithoutIssues
into reusable helpers: buildExperimentExclusionCondition for the
optimization experiment filtering, and paginateSpanRows for the
limit+1 cursor pagination pattern.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Wrap span query exports with scopedQuery for consistent workspace-scoped query definitions and shared error unwrapping semantics.
@geclos geclos force-pushed the claude/clickhouse-read-operations-a59cO branch from fe1739e to 4c8abe5 Compare February 16, 2026 17:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Inbox

Development

Successfully merging this pull request may close these issues.

3 participants