Skip to content

test: add OTEL-to-ClickHouse ingestion assertions to integration smoke test#200

Merged
dhable merged 10 commits intomainfrom
dan/more-otel-collector-testing
Mar 25, 2026
Merged

test: add OTEL-to-ClickHouse ingestion assertions to integration smoke test#200
dhable merged 10 commits intomainfrom
dan/more-otel-collector-testing

Conversation

@dhable
Copy link
Copy Markdown
Collaborator

@dhable dhable commented Mar 25, 2026

Summary

  • extend scripts/smoke-test.sh with ClickHouse query helpers, port-forward lifecycle cleanup, and ingestion polling
  • generate OTLP traces/logs with telemetrygen and send them to the in-cluster collector (4318)
  • assert default.otel_traces and default.otel_logs row counts increase after telemetry is sent
  • run smoke tests with TIMEOUT=300 in the integration workflow so failures surface sooner
  • set hyperdx.config.OPAMP_SERVER_URL: "" in CI test values so the collector runs in standalone mode with built-in ClickHouse export pipelines

Test

  • bash -n scripts/smoke-test.sh
  • Helm Chart Integration Test passed on this PR (test-helm-chart)
  • helm-unittest passed on this PR

New Test Assertions

  • fetch CLICKHOUSE_APP_PASSWORD from clickstack-secret and query ClickHouse over 8123 as app
  • capture baseline counts from default.otel_traces and default.otel_logs
  • send synthetic OTLP traces/logs to the collector
  • poll until each table count is greater than baseline; fail on timeout/query/auth errors

Why This Would Catch 6f29e730856cb5bcc30138dd168794bfdb17441d

That commit fixed a grant-shape issue where split app grants could result in missing SELECT privileges (SELECT ON default.* / SELECT ON system.*) because only the first grant was effectively applied.

The new integration assertions actively query default.otel_traces and default.otel_logs as the app user before and after telemetry ingestion. If the old broken grant shape were reintroduced, those SELECT checks would fail (or remain unreadable), causing the smoke test and PR checks to fail automatically.

Made with Cursor

Add synthetic OTLP trace/log submissions and ClickHouse row-count assertions so integration smoke tests fail when collector ingestion is not persisted. Increase smoke test timeout in workflow to allow propagation time.

Made-with: Cursor
@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Mar 25, 2026

⚠️ No Changeset found

Latest commit: e2b7f20

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

dhable and others added 7 commits March 25, 2026 13:08
Avoid direct OTLP curl payload injection and instead verify ClickHouse persistence from telemetry generated by HyperDX app activity, which is compatible with the chart's collector setup in CI.

Made-with: Cursor
Use telemetrygen to send traces and logs to the collector (gRPC with HTTP fallback) before checking ClickHouse row growth so the integration assertion validates end-to-end ingestion with valid OTLP traffic.

Made-with: Cursor
Use OTLP HTTP/protobuf on port 4318 for synthetic telemetry generation to match the chart's configured endpoint and avoid gRPC transport failures during integration runs.

Made-with: Cursor
Set CUSTOM_OTELCOL_CONFIG_FILE and provide an OTEL relay config that routes OTLP traces/logs to ClickHouse in chart integration tests, allowing ingestion assertions to validate persisted data.

Made-with: Cursor
Set OPAMP_SERVER_URL to an empty value in test overrides so the clickstack collector uses its built-in standalone ClickHouse pipelines, avoiding supervisor remote-config startup failures in CI.

Made-with: Cursor
Reduce the smoke test TIMEOUT from 600s to 300s so ingestion assertion failures are detected sooner in CI.

Made-with: Cursor
@dhable dhable enabled auto-merge (squash) March 25, 2026 20:05
kubectl port-forward service/$RELEASE_NAME-$CHART_NAME-app 3000:3000 -n $NAMESPACE &
pf_pid=$!
sleep 10
pf_pid=$(start_port_forward "service/$RELEASE_NAME-$CHART_NAME-app" "3000" "3000" "hyperdx-ui")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance we could add a Playwright test to make sure the log line appears on the search page?

Verify the HyperDX app works end-to-end after OTEL ingestion by
registering a user through the UI and searching for the smoke-test
log line on /search. This proves registration, default source setup,
and the ClickHouse query path all function correctly.

Made-with: Cursor
- name: Set up Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use node 24? 20 is too old

{
"private": true,
"devDependencies": {
"@playwright/test": "^1.52.0"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

latest 1.58.2?

Address PR review feedback on version choices.

Made-with: Cursor
@dhable dhable merged commit 1a85f26 into main Mar 25, 2026
3 checks passed
@dhable dhable deleted the dan/more-otel-collector-testing branch March 25, 2026 21:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants