Skip to content

feat: optimized default otel-logs schema#2125

Open
knudtty wants to merge 7 commits intomainfrom
aaron/new-otel-schema
Open

feat: optimized default otel-logs schema#2125
knudtty wants to merge 7 commits intomainfrom
aaron/new-otel-schema

Conversation

@knudtty
Copy link
Copy Markdown
Contributor

@knudtty knudtty commented Apr 15, 2026

Summary

After weeks of benchmarking, this optimized logs schema proves to beat the existing schema in almost all queries. The primary key is now prepended with toStartOfFiveMinutes, which groups rows together well enough that TimestampTime is not needed. Additionally, full text search is added for all indexes as it shows superior performance with minimal ingest overhead.

How to test locally

  1. If you have a .volumes directory locally, (re)move it
  2. yarn dev
  3. Poke around and make sure everything is working. Check the schema to ensure its the new one

References

  • Linear Issue: Closes HDX-4034

@vercel
Copy link
Copy Markdown

vercel bot commented Apr 15, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
hyperdx-oss Ready Ready Preview, Comment Apr 17, 2026 6:18pm

Request Review

@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Apr 15, 2026

🦋 Changeset detected

Latest commit: 2ddf9fd

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages
Name Type
@hyperdx/otel-collector Minor
@hyperdx/api Minor
@hyperdx/app Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@github-actions github-actions bot added the review/tier-4 Critical — deep review + domain expert sign-off label Apr 15, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 15, 2026

🔴 Tier 4 — Critical

Touches auth, data models, config, tasks, OTel pipeline, ClickHouse, or CI/CD.

Why this tier:

  • Critical-path files (7):
    • docker/clickhouse/local/init-db-e2e.sh
    • docker/hyperdx/Dockerfile
    • docker/hyperdx/entry.local.base.sh
    • docker/otel-collector/schema/seed/00002_otel_logs.sql
    • docker/otel-collector/schema/seed/00002_otel_logs_compat.sql
    • packages/otel-collector/cmd/migrate/main.go
    • packages/otel-collector/cmd/migrate/main_test.go
  • Cross-layer change: touches frontend (packages/app) + backend (packages/api)

Review process: Deep review from a domain expert. Synchronous walkthrough may be required.
SLA: Schedule synchronous review within 2 business days.

Stats
  • Production files changed: 25
  • Production lines changed: 376 (+ 16 in test files, excluded from tier calculation)
  • Branch: aaron/new-otel-schema
  • Author: knudtty

To override this classification, remove the review/tier-4 label and apply a different review/tier-* label. Manual overrides are preserved on subsequent pushes.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 15, 2026

PR Review

  • ⚠️ No upgrade migration for existing installationsCREATE TABLE IF NOT EXISTS silently skips schema changes for users with existing otel_logs tables. Upgrading users keep the old schema (with TimestampTime, bloom filters, old ORDER BY) and get no benefit. Consider documenting that a fresh volume is required, or add an ALTER TABLE migration path.

  • ⚠️ hyperdx_sessions inconsistency across DEFAULT_SOURCES → In entry.local.base.sh, the main DEFAULT_SOURCES keeps "timestampValueExpression":"TimestampTime" for sessions, but the BETA_CH_OTEL_JSON_SCHEMA_ENABLED branch uses "timestampValueExpression":"Timestamp" for sessions. If hyperdx_sessions still has TimestampTime this is fine, but the divergence is unexplained and could confuse the next editor.

  • ⚠️ text(tokenizer = 'array') on TraceId string column → The array tokenizer is designed for array-type columns (like mapKeys()/mapValues() results). Using it on a plain String like TraceId may work for exact-match (the whole string becomes one token), but this is non-obvious and the behavior isn't the same as the bloom_filter(0.001) it replaces. A brief comment explaining the benchmarked rationale would help avoid future "why array?" questions.

  • ✅ Version detection + compat schema fallback (getClickHouseVersion / swapLogsSchemaForCompat) is well-structured and fully unit-tested.

  • ✅ TTL fix (toDateTime(Timestamp)) is correct — DateTime64 can't be used directly in TTL expressions.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 15, 2026

E2E Test Results

All tests passed • 146 passed • 3 skipped • 1121s

Status Count
✅ Passed 146
❌ Failed 0
⚠️ Flaky 4
⏭️ Skipped 3

Tests ran across 4 shards in parallel.

View full report →

@knudtty
Copy link
Copy Markdown
Contributor Author

knudtty commented Apr 15, 2026

Edit: This is completed

I still need to add some logic that only adds FTS if the CH version is > 26.2. Will require some modification to the migration logic

@knudtty knudtty requested review from a team, bot-hyperdx and dhable and removed request for a team and bot-hyperdx April 16, 2026 01:16
INDEX idx_scope_attr_value mapValues(ScopeAttributes) TYPE text(tokenizer = 'array'),
INDEX idx_log_attr_key mapKeys(LogAttributes) TYPE text(tokenizer = 'array'),
INDEX idx_log_attr_value mapValues(LogAttributes) TYPE text(tokenizer = 'array'),
INDEX idx_lower_body lower(Body) TYPE text(tokenizer = 'splitByNonAlpha')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about adding FTS on all fields like

ADD INDEX fulltext_lower_v1_idx concatWithSeparator(';', TraceId, SpanId, TraceFlags, SeverityText, SeverityNumber, ServiceName, Body, ResourceSchemaUrl, ResourceAttributes, ScopeSchemaUrl, ScopeName, ScopeVersion, ScopeAttributes, LogAttributes) TYPE text(tokenizer = 'splitByNonAlpha', preprocessor = lower(concatWithSeparator(';', TraceId, SpanId, TraceFlags, SeverityText, SeverityNumber, ServiceName, Body, ResourceSchemaUrl, ResourceAttributes, ScopeSchemaUrl, ScopeName, ScopeVersion, ScopeAttributes, LogAttributes))) GRANULARITY 100000000

Add change the implicit search field to concatWithSeparator(';', TraceId, SpanId, TraceFlags, SeverityText, SeverityNumber, ServiceName, Body, ResourceSchemaUrl, ResourceAttributes, ScopeSchemaUrl, ScopeName, ScopeVersion, ScopeAttributes, LogAttributes). This is going to massively improve DX since developers won’t always need to look up the attributes. Any thoughts?

@@ -1 +1 @@
SELECT SeverityText FROM otel_logs WHERE ResourceAttributes['suite-id'] = 'normalize-severity' AND ResourceAttributes['test-id'] = 'text-case' ORDER BY TimestampTime FORMAT CSV
SELECT SeverityText FROM otel_logs WHERE ResourceAttributes['suite-id'] = 'normalize-severity' AND ResourceAttributes['test-id'] = 'text-case' ORDER BY (toStartOfFiveMinutes(Timestamp), Timestamp) FORMAT CSV
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For smoke tests, I’d suggest adding another ClickHouse instance with an older version (< 26.2) and ensuring the schema and SELECT queries still work correctly.

ENABLE_SWAGGER=true
DEFAULT_CONNECTIONS=[{"name":"Local ClickHouse","host":"http://localhost:${HDX_DEV_CH_HTTP_PORT}","username":"default","password":""}]
DEFAULT_SOURCES=[{"from":{"databaseName":"default","tableName":"otel_logs"},"kind":"log","timestampValueExpression":"TimestampTime","name":"Logs","displayedTimestampValueExpression":"Timestamp","implicitColumnExpression":"Body","serviceNameExpression":"ServiceName","bodyExpression":"Body","eventAttributesExpression":"LogAttributes","resourceAttributesExpression":"ResourceAttributes","defaultTableSelectExpression":"Timestamp,ServiceName,SeverityText,Body","severityTextExpression":"SeverityText","traceIdExpression":"TraceId","spanIdExpression":"SpanId","connection":"Local ClickHouse","traceSourceId":"Traces","sessionSourceId":"Sessions","metricSourceId":"Metrics"},{"from":{"databaseName":"default","tableName":"otel_traces"},"kind":"trace","timestampValueExpression":"Timestamp","name":"Traces","displayedTimestampValueExpression":"Timestamp","implicitColumnExpression":"SpanName","serviceNameExpression":"ServiceName","eventAttributesExpression":"SpanAttributes","resourceAttributesExpression":"ResourceAttributes","defaultTableSelectExpression":"Timestamp,ServiceName,StatusCode,round(Duration/1e6),SpanName","traceIdExpression":"TraceId","spanIdExpression":"SpanId","durationExpression":"Duration","durationPrecision":9,"parentSpanIdExpression":"ParentSpanId","spanNameExpression":"SpanName","spanKindExpression":"SpanKind","statusCodeExpression":"StatusCode","statusMessageExpression":"StatusMessage","connection":"Local ClickHouse","logSourceId":"Logs","sessionSourceId":"Sessions","metricSourceId":"Metrics"},{"from":{"databaseName":"default","tableName":""},"kind":"metric","timestampValueExpression":"TimeUnix","name":"Metrics","resourceAttributesExpression":"ResourceAttributes","metricTables":{"gauge":"otel_metrics_gauge","histogram":"otel_metrics_histogram","sum":"otel_metrics_sum","_id":"682586a8b1f81924e628e808","id":"682586a8b1f81924e628e808"},"connection":"Local ClickHouse","logSourceId":"Logs","traceSourceId":"Traces","sessionSourceId":"Sessions"},{"from":{"databaseName":"default","tableName":"hyperdx_sessions"},"kind":"session","timestampValueExpression":"TimestampTime","name":"Sessions","displayedTimestampValueExpression":"Timestamp","implicitColumnExpression":"Body","serviceNameExpression":"ServiceName","bodyExpression":"Body","eventAttributesExpression":"LogAttributes","resourceAttributesExpression":"ResourceAttributes","defaultTableSelectExpression":"Timestamp,ServiceName,SeverityText,Body","severityTextExpression":"SeverityText","traceIdExpression":"TraceId","spanIdExpression":"SpanId","connection":"Local ClickHouse","logSourceId":"Logs","traceSourceId":"Traces","metricSourceId":"Metrics"},{"from":{"databaseName":"otel_json","tableName":"otel_logs"},"kind":"log","timestampValueExpression":"Timestamp","name":"JSON Logs","displayedTimestampValueExpression":"Timestamp","implicitColumnExpression":"Body","serviceNameExpression":"ServiceName","bodyExpression":"Body","eventAttributesExpression":"LogAttributes","resourceAttributesExpression":"ResourceAttributes","defaultTableSelectExpression":"Timestamp,ServiceName,SeverityText,Body","severityTextExpression":"SeverityText","traceIdExpression":"TraceId","spanIdExpression":"SpanId","connection":"Local ClickHouse","traceSourceId":"JSON Traces","metricSourceId":"JSON Metrics"},{"from":{"databaseName":"otel_json","tableName":"otel_traces"},"kind":"trace","timestampValueExpression":"Timestamp","name":"JSON Traces","displayedTimestampValueExpression":"Timestamp","implicitColumnExpression":"SpanName","serviceNameExpression":"ServiceName","eventAttributesExpression":"SpanAttributes","resourceAttributesExpression":"ResourceAttributes","defaultTableSelectExpression":"Timestamp,ServiceName,StatusCode,round(Duration/1e6),SpanName","traceIdExpression":"TraceId","spanIdExpression":"SpanId","durationExpression":"Duration","durationPrecision":9,"parentSpanIdExpression":"ParentSpanId","spanNameExpression":"SpanName","spanKindExpression":"SpanKind","statusCodeExpression":"StatusCode","statusMessageExpression":"StatusMessage","connection":"Local ClickHouse","logSourceId":"JSON Logs","metricSourceId":"JSON Metrics"},{"from":{"databaseName":"otel_json","tableName":""},"kind":"metric","timestampValueExpression":"TimeUnix","name":"JSON Metrics","resourceAttributesExpression":"ResourceAttributes","metricTables":{"gauge":"otel_metrics_gauge","histogram":"otel_metrics_histogram","sum":"otel_metrics_sum"},"connection":"Local ClickHouse","logSourceId":"JSON Logs","traceSourceId":"JSON Traces"}]
DEFAULT_SOURCES=[{"from":{"databaseName":"default","tableName":"otel_logs"},"kind":"log","timestampValueExpression":"Timestamp","name":"Logs","displayedTimestampValueExpression":"Timestamp","implicitColumnExpression":"Body","serviceNameExpression":"ServiceName","bodyExpression":"Body","eventAttributesExpression":"LogAttributes","resourceAttributesExpression":"ResourceAttributes","defaultTableSelectExpression":"Timestamp,ServiceName,SeverityText,Body","severityTextExpression":"SeverityText","traceIdExpression":"TraceId","spanIdExpression":"SpanId","connection":"Local ClickHouse","traceSourceId":"Traces","sessionSourceId":"Sessions","metricSourceId":"Metrics"},{"from":{"databaseName":"default","tableName":"otel_traces"},"kind":"trace","timestampValueExpression":"Timestamp","name":"Traces","displayedTimestampValueExpression":"Timestamp","implicitColumnExpression":"SpanName","serviceNameExpression":"ServiceName","eventAttributesExpression":"SpanAttributes","resourceAttributesExpression":"ResourceAttributes","defaultTableSelectExpression":"Timestamp,ServiceName,StatusCode,round(Duration/1e6),SpanName","traceIdExpression":"TraceId","spanIdExpression":"SpanId","durationExpression":"Duration","durationPrecision":9,"parentSpanIdExpression":"ParentSpanId","spanNameExpression":"SpanName","spanKindExpression":"SpanKind","statusCodeExpression":"StatusCode","statusMessageExpression":"StatusMessage","connection":"Local ClickHouse","logSourceId":"Logs","sessionSourceId":"Sessions","metricSourceId":"Metrics"},{"from":{"databaseName":"default","tableName":""},"kind":"metric","timestampValueExpression":"TimeUnix","name":"Metrics","resourceAttributesExpression":"ResourceAttributes","metricTables":{"gauge":"otel_metrics_gauge","histogram":"otel_metrics_histogram","sum":"otel_metrics_sum","_id":"682586a8b1f81924e628e808","id":"682586a8b1f81924e628e808"},"connection":"Local ClickHouse","logSourceId":"Logs","traceSourceId":"Traces","sessionSourceId":"Sessions"},{"from":{"databaseName":"default","tableName":"hyperdx_sessions"},"kind":"session","timestampValueExpression":"TimestampTime","name":"Sessions","displayedTimestampValueExpression":"Timestamp","implicitColumnExpression":"Body","serviceNameExpression":"ServiceName","bodyExpression":"Body","eventAttributesExpression":"LogAttributes","resourceAttributesExpression":"ResourceAttributes","defaultTableSelectExpression":"Timestamp,ServiceName,SeverityText,Body","severityTextExpression":"SeverityText","traceIdExpression":"TraceId","spanIdExpression":"SpanId","connection":"Local ClickHouse","logSourceId":"Logs","traceSourceId":"Traces","metricSourceId":"Metrics"},{"from":{"databaseName":"otel_json","tableName":"otel_logs"},"kind":"log","timestampValueExpression":"Timestamp","name":"JSON Logs","displayedTimestampValueExpression":"Timestamp","implicitColumnExpression":"Body","serviceNameExpression":"ServiceName","bodyExpression":"Body","eventAttributesExpression":"LogAttributes","resourceAttributesExpression":"ResourceAttributes","defaultTableSelectExpression":"Timestamp,ServiceName,SeverityText,Body","severityTextExpression":"SeverityText","traceIdExpression":"TraceId","spanIdExpression":"SpanId","connection":"Local ClickHouse","traceSourceId":"JSON Traces","metricSourceId":"JSON Metrics"},{"from":{"databaseName":"otel_json","tableName":"otel_traces"},"kind":"trace","timestampValueExpression":"Timestamp","name":"JSON Traces","displayedTimestampValueExpression":"Timestamp","implicitColumnExpression":"SpanName","serviceNameExpression":"ServiceName","eventAttributesExpression":"SpanAttributes","resourceAttributesExpression":"ResourceAttributes","defaultTableSelectExpression":"Timestamp,ServiceName,StatusCode,round(Duration/1e6),SpanName","traceIdExpression":"TraceId","spanIdExpression":"SpanId","durationExpression":"Duration","durationPrecision":9,"parentSpanIdExpression":"ParentSpanId","spanNameExpression":"SpanName","spanKindExpression":"SpanKind","statusCodeExpression":"StatusCode","statusMessageExpression":"StatusMessage","connection":"Local ClickHouse","logSourceId":"JSON Logs","metricSourceId":"JSON Metrics"},{"from":{"databaseName":"otel_json","tableName":""},"kind":"metric","timestampValueExpression":"TimeUnix","name":"JSON Metrics","resourceAttributesExpression":"ResourceAttributes","metricTables":{"gauge":"otel_metrics_gauge","histogram":"otel_metrics_histogram","sum":"otel_metrics_sum"},"connection":"Local ClickHouse","logSourceId":"JSON Logs","traceSourceId":"JSON Traces"}]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

INDEX idx_log_attr_value mapValues(LogAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_lower_body lower(Body) TYPE tokenbf_v1(32768, 3, 0) GRANULARITY 8
INDEX idx_trace_id TraceId TYPE text(tokenizer = 'array'),
INDEX idx_res_attr_key mapKeys(ResourceAttributes) TYPE text(tokenizer = 'array'),
Copy link
Copy Markdown
Contributor

@vinzee vinzee Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that the direct read optimization does not kick in for queries like (Map['key'] IN ('val')), is there a general benefit of making text indices the default for Maps? context: #2134

would be nice to see a performance comparison here. (specially around storage and read-latency)

CREATE TABLE IF NOT EXISTS ${DATABASE}.e2e_otel_logs
(
\`Timestamp\` DateTime64(9) CODEC(Delta(8), ZSTD(1)),
\`TimestampTime\` DateTime DEFAULT toDateTime(Timestamp),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering the rationale behind removing this? as I understand, toDateTime gives second granularity and was deemed to be more performant for partitioning/ordering. did that change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

review/tier-4 Critical — deep review + domain expert sign-off

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants