Skip to content

Service Map Span Kind Expression not working in WHERE clause #1512

@litkhai

Description

@litkhai

HyperDX GitHub Issue: Service Map Span Kind Expression not working in WHERE clause

Issue Summary

Title: Service Map: Span Kind Expression not applied in WHERE clause for span kind filtering

Image Image

Labels: service-map, beta, expression-support

Preliminary Note

This issue may be resolved once ClickStack Cloud endpoint becomes available. ClickStack is designed to handle OpenTelemetry data ingestion with proper schema transformations. However, until ClickStack Cloud endpoint is generally available, users relying on direct OTEL Collector → ClickHouse integration encounter this issue.

Description

The Service Map feature (beta, released November 2025) does not properly apply the Span Kind Expression when filtering spans in the WHERE clause. This causes Service Maps to fail with "No services found" when using OpenTelemetry standard span kind values (SPAN_KIND_CLIENT, SPAN_KIND_SERVER).

Environment

  • HyperDX Version: ClickHouse Cloud hosted (https://hyperdx.clickhouse.cloud/)
  • ClickHouse Version: ClickHouse Cloud
  • Data Source: OpenTelemetry Collector (otel/opentelemetry-collector-contrib:0.93.0) with ClickHouse exporter

Steps to Reproduce

  1. Ingest OpenTelemetry traces using OTEL Collector with ClickHouse exporter

    • SpanKind values stored in database: SPAN_KIND_CLIENT, SPAN_KIND_SERVER, SPAN_KIND_INTERNAL (OpenTelemetry standard format)
  2. Create a Trace Source in HyperDX with the following configuration:

    • Span Kind Expression: replaceAll(SpanKind, 'SPAN_KIND_', '')
    • Other standard expressions (TraceId, SpanId, ServiceName, etc.)
  3. Navigate to Service Map page

  4. Select the configured Trace Source

  5. Set time range to include data

Expected Behavior

Service Map should display services and their connections because:

  1. The Span Kind Expression should transform SpanKind values to the format HyperDX expects:

    • SPAN_KIND_CLIENTClient (title case)
    • SPAN_KIND_SERVERServer (title case)
    • SPAN_KIND_INTERNALInternal (title case)
  2. Service Map should then use these transformed values to find Client-Server relationships

Note: Testing revealed that HyperDX expects title case (Client, Server) not uppercase (CLIENT, SERVER). The expression replaceAll(SpanKind, 'SPAN_KIND_', '') produces uppercase, which doesn't match HyperDX's expectations even if expressions were properly applied.

Actual Behavior

Service Map displays:

No services found. The Service Map shows links between services with related Client- and Server-kind spans.

Evidence

1. Data exists with CLIENT-SERVER relationships

Direct ClickHouse query using the expression transformation works correctly:

WITH service_calls AS (
    SELECT
        client.ServiceName as source_service,
        server.ServiceName as target_service,
        COUNT(*) as call_count
    FROM otel_traces client
    INNER JOIN otel_traces server
        ON client.TraceId = server.TraceId
        AND client.SpanId = server.ParentSpanId
    WHERE replaceAll(client.SpanKind, 'SPAN_KIND_', '') = 'CLIENT'
        AND replaceAll(server.SpanKind, 'SPAN_KIND_', '') = 'SERVER'
    GROUP BY source_service, target_service
)
SELECT * FROM service_calls
ORDER BY call_count DESC;

Result: Multiple service-to-service relationships found, proving the data exists.

2. Browser DevTools error

When inspecting network requests in Service Map, HyperDX returns errors suggesting expressions are not being applied in WHERE clauses.

Root Cause Analysis

Verified Issues

  1. Primary Issue: Span Kind Expression not applied in WHERE clauses

    • Configured expressions work in direct ClickHouse queries
    • Same expressions fail in HyperDX Service Map
    • Evidence suggests WHERE clauses use literal values instead of applying configured expressions
  2. Secondary Issue: Case sensitivity requirement

    • Testing revealed HyperDX expects title case (Client, Server, Internal)
    • OpenTelemetry standard produces: SPAN_KIND_CLIENT, SPAN_KIND_SERVER
    • Simple replaceAll(SpanKind, 'SPAN_KIND_', '') produces uppercase: CLIENT, SERVER
    • This mismatch means users need CASE expressions, not simple replaceAll

Expected SQL Pattern

-- What should work but doesn't:
WHERE <configured_span_kind_expression> = 'Client'
  AND <configured_span_kind_expression> = 'Server'

-- Example with proper expression and case handling:
WHERE CASE
    WHEN SpanKind = 'SPAN_KIND_CLIENT' THEN 'Client'
    WHEN SpanKind = 'SPAN_KIND_SERVER' THEN 'Server'
    ELSE SpanKind
  END = 'Client'

Proposed Solution

Apply the configured Span Kind Expression in WHERE clauses when filtering for CLIENT and SERVER spans in Service Map queries.

Why Expression Support Matters

Proper Expression support would eliminate the need for workarounds:

  • Storage: Eliminates ~2x storage overhead from duplicate tables
  • Simplicity: No manual materialized view creation or maintenance
  • Consistency: Expression fields work as documented in the UI
  • Compatibility: Native support for OpenTelemetry standard format (SPAN_KIND_*)

Suggested Implementation

  1. Apply expressions consistently in all SQL clauses:

    • ✅ SELECT clauses (currently working)
    • ⚠️ WHERE clauses (not working)
    • JOIN, GROUP BY clauses (if applicable)
  2. Consider case-insensitive SpanKind comparison or document the title case requirement

Workaround

✅ Verified Working Solution: Materialized View with Proper Case Transformation

Status: Successfully tested and confirmed working with HyperDX Service Map.

Important Discovery: HyperDX Service Map expects SpanKind values with title case (Client, Server, Internal), not uppercase (CLIENT, SERVER, INTERNAL).

Implementation

-- 1. Create target table
CREATE TABLE o11y.otel_traces_conv
(
    Timestamp DateTime64(9),
    TraceId String,
    SpanId String,
    ParentSpanId String,
    TraceState String,
    SpanName LowCardinality(String),
    SpanKind LowCardinality(String),
    ServiceName LowCardinality(String),
    ResourceAttributes Map(LowCardinality(String), String),
    ScopeName String,
    ScopeVersion String,
    SpanAttributes Map(LowCardinality(String), String),
    Duration UInt64,
    StatusCode LowCardinality(String),
    StatusMessage String,
    `Events.Timestamp` Array(DateTime64(9)),
    `Events.Name` Array(LowCardinality(String)),
    `Events.Attributes` Array(Map(LowCardinality(String), String)),
    `Links.TraceId` Array(String),
    `Links.SpanId` Array(String),
    `Links.TraceState` Array(String),
    `Links.Attributes` Array(Map(LowCardinality(String), String))
)
ENGINE = MergeTree()
ORDER BY (ServiceName, Timestamp);

-- 2. Create materialized view with proper case transformation
CREATE MATERIALIZED VIEW o11y.otel_traces_conv_mv
TO o11y.otel_traces_conv
AS SELECT
    Timestamp,
    TraceId,
    SpanId,
    ParentSpanId,
    TraceState,
    SpanName,
    CASE
        WHEN SpanKind = 'SPAN_KIND_CLIENT' THEN 'Client'
        WHEN SpanKind = 'SPAN_KIND_SERVER' THEN 'Server'
        WHEN SpanKind = 'SPAN_KIND_INTERNAL' THEN 'Internal'
        WHEN SpanKind = 'SPAN_KIND_PRODUCER' THEN 'Producer'
        WHEN SpanKind = 'SPAN_KIND_CONSUMER' THEN 'Consumer'
        ELSE SpanKind
    END AS SpanKind,
    ServiceName,
    ResourceAttributes,
    ScopeName,
    ScopeVersion,
    SpanAttributes,
    Duration,
    StatusCode,
    StatusMessage,
    `Events.Timestamp`,
    `Events.Name`,
    `Events.Attributes`,
    `Links.TraceId`,
    `Links.SpanId`,
    `Links.TraceState`,
    `Links.Attributes`
FROM o11y.otel_traces;

-- 3. Populate with existing data
INSERT INTO o11y.otel_traces_conv
SELECT * FROM o11y.otel_traces_conv_mv;

HyperDX Configuration for Workaround

Table: otel_traces_conv  (the converted table)
Span Kind Expression: SpanKind  (no transformation needed!)

Test Results

Service Map Successfully Displays: Multiple microservice connections with call counts and latency metrics

Key Learnings

  1. Case Sensitivity: HyperDX requires title case (Client/Server/Internal)
  2. Expression Not Applied: Span Kind Expression is ignored in WHERE clauses
  3. Workaround Effective: Pre-transforming data via materialized view bypasses the expression issue
  4. Storage Cost: Requires ~2x storage (original + transformed table)

Alternative Workarounds

Use ClickHouse queries directly:

  • Bypass HyperDX UI and query ClickHouse directly for Service Map data
  • Apply transformations in SQL queries manually

Additional Context

  • Service Map is a beta feature (released November 2025)
  • This issue affects any OpenTelemetry Collector setup using the ClickHouse exporter
  • The ClickHouse exporter always uses OpenTelemetry standard format (SPAN_KIND_*)
  • Other HyperDX features (App Trace, Search) work correctly with the same data

Code Investigation

Bug Location Confirmed

File: packages/app/src/hooks/useServiceMap.tsx

Function: getServiceMapQuery

The bug exists in the Service Map query construction code:

async function getServiceMapQuery({
  source,
  dateRange,
  traceId,
  metadata,
  samplingFactor,
}: {
  source: TSource;
  dateRange: [Date, Date];
  traceId?: string;
  metadata: Metadata;
  samplingFactor: number;
}) {
  // ... other code ...

  const [serverCTE, clientCTE] = await Promise.all([
    renderChartConfig(
      {
        ...baseCTEConfig,
        filters: [
          ...baseCTEConfig.filters,
          {
            type: 'sql',
            condition: `${source.spanKindExpression} IN ('Server', 'Consumer')`,  // ⚠️ Expression not applied
          },
        ],
        where: '',
      },
      metadata,
    ),
    renderChartConfig(
      {
        ...baseCTEConfig,
        filters: [
          ...baseCTEConfig.filters,
          {
            type: 'sql',
            condition: `${source.spanKindExpression} IN ('Client', 'Producer')`,  // ⚠️ Expression not applied
          },
        ],
        where: '',
      },
      metadata,
    ),
  ]);

The Problem

When source.spanKindExpression contains a ClickHouse expression like replaceAll(SpanKind, 'SPAN_KIND_', ''), the code correctly inserts it into the SQL string template:

condition: `${source.spanKindExpression} IN ('Server', 'Consumer')`

However, somewhere in the query rendering pipeline (likely in renderChartConfig), the expression is not being properly applied or is being replaced with just the column name.

Expected Behavior

The generated SQL should be:

WHERE replaceAll(SpanKind, 'SPAN_KIND_', '') IN ('Server', 'Consumer')

Actual Behavior

Testing suggests the expression transformation is not applied in WHERE clauses, causing queries to match no rows when SpanKind values are in OpenTelemetry format (SPAN_KIND_SERVER, SPAN_KIND_CONSUMER).

Analysis

The renderChartConfig function (or its downstream processing) may not be preserving the full expression when building WHERE clauses.

Sample Data

OpenTelemetry Standard SpanKind Format

When using OTEL Collector with ClickHouse exporter, SpanKind values follow the OpenTelemetry specification:

SPAN_KIND_CLIENT
SPAN_KIND_SERVER
SPAN_KIND_INTERNAL
SPAN_KIND_PRODUCER
SPAN_KIND_CONSUMER

HyperDX Expected Format

Testing revealed Service Map expects title case:

Client
Server
Internal
Producer
Consumer

Impact

Current Impact

This issue prevents users from using Service Map with standard OpenTelemetry data collection pipelines. Since the OTEL Collector's ClickHouse exporter is the recommended way to send traces to ClickHouse, this affects most users attempting to use Service Map with OTEL data.

Workaround Status

Workaround Available: Materialized view solution successfully tested and documented.

However, the workaround:

  • Requires ~2x storage (original + transformed tables)
  • Adds operational complexity (manual MV creation and maintenance)
  • Needs ClickHouse SQL expertise
  • Doesn't scale well for large deployments

Impact Summary

  • Workaround requires ~2x storage and manual maintenance
  • Affects standard OpenTelemetry Collector setups
  • Expression fields in UI don't work as expected

Priority

Medium-High - Service Map requires workaround for standard OpenTelemetry setups. Expression support would improve user experience and eliminate storage overhead.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions