Skip to content

refactor: unify telemetry architecture and use background workers for all telemetry#3642

Merged
eablack merged 13 commits intomainfrom
eb/refactor-telemetry-again
Apr 2, 2026
Merged

refactor: unify telemetry architecture and use background workers for all telemetry#3642
eablack merged 13 commits intomainfrom
eb/refactor-telemetry-again

Conversation

@eablack
Copy link
Copy Markdown
Contributor

@eablack eablack commented Apr 1, 2026

Summary

This PR refactors the telemetry system to improve performance and consistency by unifying the architecture across all telemetry clients and ensuring all telemetry operations run in background worker processes.

Key Changes:

  1. Unified background worker pattern: Both Herokulytics and OTEL/Sentry telemetry now use the same background worker process via spawnTelemetryWorker(), preventing any telemetry operations from blocking the main CLI process.

  2. Consistent class-based architecture: All three telemetry clients (BackboardHerokulyticsClient, BackboardOtelClient, SentryClient) now use the same class-based pattern with lazy initialization.

  3. Improved naming for clarity:

    • analytics.tsbackboard-herokulytics-client.ts (clarifies it sends Herokulytics data)
    • honeycomb-client.tsbackboard-otel-client.ts (clarifies it sends to Backboard, not directly to Honeycomb)
    • Hook files renamed to describe what they do (e.g., collect-and-send-herokulytics.ts)
  4. Enhanced debugging: Added comprehensive telemetryDebug logging throughout all telemetry clients for easier troubleshooting (enabled via DEBUG=analytics-telemetry).

  5. Optimized imports: Moved dynamic imports after telemetry checks to avoid loading unnecessary modules when telemetry is disabled.

  6. Added isTTY to debug output: Fixed missing isTTY field in OTEL debug logging.

Architectural Overview:

  • Herokulytics (command usage analytics) → Backboard /hamurai endpoint
  • OTEL (performance telemetry) → Backboard /otel/v1/traces endpoint → Honeycomb
  • Sentry (error reporting) → Sentry.io
  • All three systems now use background workers, ensuring zero blocking overhead

Type of Change

Patch Updates (patch semver update)

  • refactor: Refactoring existing code without changing behavior

Testing

Notes:

  • All existing telemetry functionality remains unchanged
  • Background workers prevent any blocking of the CLI
  • Debug logging can be enabled with DEBUG=analytics-telemetry

Steps:

  1. Run any CLI command and verify telemetry is sent in the background
  2. Run with DEBUG=analytics-telemetry to see detailed telemetry logging
  3. Verify Herokulytics, OTEL, and Sentry data is still being collected
  4. Passing CI suffices for unit test validation

… all telemetry

This PR refactors the telemetry system to improve performance and consistency:

1. Unified background worker pattern: Both Herokulytics and OTEL/Sentry telemetry
   now use the same background worker process via spawnTelemetryWorker(), preventing
   any telemetry operations from blocking the main CLI process.

2. Consistent class-based architecture: All three telemetry clients
   (BackboardHerokulyticsClient, BackboardOtelClient, SentryClient) now use
   the same class-based pattern with lazy initialization.

3. Improved naming for clarity:
   - analytics.ts → backboard-herokulytics-client.ts (clarifies it sends Herokulytics data)
   - honeycomb-client.ts → backboard-otel-client.ts (clarifies it sends to Backboard, not directly to Honeycomb)
   - Hook files renamed to describe what they do (e.g., collect-and-send-herokulytics.ts)

4. Enhanced debugging: Added comprehensive telemetryDebug logging throughout
   all telemetry clients for easier troubleshooting (enabled via DEBUG=analytics-telemetry).

5. Optimized imports: Moved dynamic imports after telemetry checks to avoid
   loading unnecessary modules when telemetry is disabled.

6. Added isTTY to debug output: Fixed missing isTTY field in OTEL debug logging.

Key architectural improvements:
- Herokulytics (command usage analytics) → Backboard /hamurai endpoint
- OTEL (performance telemetry) → Backboard /otel/v1/traces endpoint → Honeycomb
- Sentry (error reporting) → Sentry.io
- All three systems now use background workers, ensuring zero blocking overhead
@eablack eablack requested a review from a team as a code owner April 1, 2026 23:21
@eablack eablack temporarily deployed to AcceptanceTests April 1, 2026 23:21 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests April 1, 2026 23:21 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests April 1, 2026 23:21 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests April 1, 2026 23:21 — with GitHub Actions Inactive
Replace implicit type inference based on field presence with explicit _type
discriminators for all telemetry data types. This makes the telemetry worker
more robust and self-documenting.

Changes:
- Add _type: 'herokulytics' to HerokulyticsData interface
- Add _type: 'otel' to Telemetry interface
- Add _type: 'error' to CLIError interface (optional)
- Update telemetry-worker to check _type field instead of inferring from field presence
- Update serializeTelemetryData to add _type: 'error' for Error objects
- Fix OTEL provider to use module-level singleton instead of instance-level
  to avoid global registry conflicts in tests
- Update all test mocks to include _type field
@eablack eablack temporarily deployed to AcceptanceTests April 1, 2026 23:45 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests April 1, 2026 23:45 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests April 1, 2026 23:45 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests April 1, 2026 23:45 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests April 1, 2026 23:46 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests April 1, 2026 23:46 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests April 1, 2026 23:46 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests April 1, 2026 23:46 — with GitHub Actions Inactive
The beforeExit handler in worker-client.ts was causing duplicate telemetry
sends because the postrun hook also sends telemetry for successful command
completion. This resulted in two spans being created for every command.

The fix removes the beforeExit handler for normal command completion, while
keeping the SIGINT/SIGTERM handlers since those bypass the hook lifecycle
and still need to send telemetry.

Now telemetry is sent once via the postrun hook for normal command completion,
and via signal handlers only when the process is interrupted.
@eablack eablack temporarily deployed to AcceptanceTests April 2, 2026 00:02 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests April 2, 2026 00:02 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests April 2, 2026 00:02 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests April 2, 2026 00:02 — with GitHub Actions Inactive
Changes the debug namespace from 'analytics-telemetry' to 'heroku:analytics' to match Heroku CLI conventions, and sets a custom color (147) for better visibility in terminal output.
@eablack eablack temporarily deployed to AcceptanceTests April 2, 2026 00:13 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests April 2, 2026 00:13 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests April 2, 2026 00:13 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests April 2, 2026 00:13 — with GitHub Actions Inactive
Defers loading of heavy OpenTelemetry and Sentry libraries until they're actually needed (in the background worker process), rather than during CLI initialization. This should significantly reduce the setup-otel-telemetry init hook time.

Before: OpenTelemetry/Sentry libraries loaded during init hook (~199ms)
After: Libraries only loaded when sendTelemetry() is called in worker process
@eablack eablack temporarily deployed to AcceptanceTests April 2, 2026 00:31 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests April 2, 2026 00:31 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests April 2, 2026 00:31 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests April 2, 2026 16:28 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests April 2, 2026 16:28 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests April 2, 2026 16:28 — with GitHub Actions Inactive
Convert 27 commands to use centralized LazyModuleLoader for improved CLI startup performance:

- Lodash (8 commands): dashboard, apps/info, access, config/edit, apps, releases, ps/type, ps/scale
- Inquirer (8 commands): pipelines/create, pipelines/add, domains/add, apps/transfer, keys/add, certs/add, certs/generate, domains (using @inquirer/prompts)
- Date-fns (5 commands): status, auth/token, certs/auto, data/maintenances/info, data/maintenances/schedule
- Chrono (1 command): data/pg/fork
- Yaml (1 command, from previous session): apps/create

Changes defer heavy npm package imports until command execution time rather than at module parse time, reducing CLI load time.

Skipped data/pg/create and data/pg/update due to complex wrapper prompt patterns.
- Delete src/deps.ts (unnecessary indirection layer)
- Move user-config.ts to lib/analytics-telemetry/herokulytics-config.ts
- Rename UserConfig class to HerokulyticsConfig (more descriptive)
- Update backboard-herokulytics-client to import HTTP and HerokulyticsConfig directly
- Update test files to use new import paths and class name
- Fix fork.unit.test.ts to pass chrono parameter to parseRollbackInterval

Since BackboardHerokulyticsClient is already lazy-loaded in the background telemetry worker, deps.ts provided no performance benefit. Direct imports are clearer and simpler.
@eablack eablack temporarily deployed to AcceptanceTests April 2, 2026 18:44 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests April 2, 2026 18:44 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests April 2, 2026 18:44 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests April 2, 2026 18:44 — with GitHub Actions Inactive
Add MAX_WORKER_LIFETIME_MS timeout to ensure the background telemetry worker never hangs indefinitely. This prevents the worker from running forever in case of:
- Network request hangs
- OpenTelemetry/Sentry failures
- Other unexpected blocking operations

The worker will now automatically exit after 10 seconds maximum, ensuring no orphaned background processes.
@eablack eablack temporarily deployed to AcceptanceTests April 2, 2026 18:48 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests April 2, 2026 18:48 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests April 2, 2026 18:48 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests April 2, 2026 18:48 — with GitHub Actions Inactive
Remove all 'Lazy-load' comments throughout the codebase. The code is self-documenting with the lazyModuleLoader pattern, making these comments redundant.
@eablack eablack temporarily deployed to AcceptanceTests April 2, 2026 18:54 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests April 2, 2026 18:54 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests April 2, 2026 18:54 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests April 2, 2026 18:54 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests April 2, 2026 19:07 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests April 2, 2026 19:07 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests April 2, 2026 19:07 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests April 2, 2026 19:07 — with GitHub Actions Inactive
@eablack eablack merged commit c13074d into main Apr 2, 2026
17 checks passed
@eablack eablack deleted the eb/refactor-telemetry-again branch April 2, 2026 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants