-
Notifications
You must be signed in to change notification settings - Fork 2.2k
feat: Add experimentalObservability with an OTel backend #11130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔧 Build Fix:
Two code blocks are missing required title attributes. The documentation framework requires all code blocks to have titles. The blocks at line 789 (under experimentalObservability.otel.headers) and line 812 (under experimentalObservability.otel.resource) both use the opening marker ```jsonc without a title attribute, causing the build to fail.
View Details
📝 Patch Details
diff --git a/docs/site/content/docs/reference/configuration.mdx b/docs/site/content/docs/reference/configuration.mdx
index ccce2e05e..1cb37964c 100644
--- a/docs/site/content/docs/reference/configuration.mdx
+++ b/docs/site/content/docs/reference/configuration.mdx
@@ -786,7 +786,7 @@ The OTLP collector endpoint URL. For example:
Optional HTTP headers to include with export requests. Useful for authentication (e.g., API keys) or custom metadata.
-```jsonc
+```jsonc title="./turbo.json"
{
"experimentalObservability": {
"otel": {
@@ -809,7 +809,7 @@ Timeout in milliseconds for export requests to the collector.
Optional resource attributes to attach to all exported metrics. These help identify the source of metrics in your observability platform.
-```jsonc
+```jsonc title="./turbo.json"
{
"experimentalObservability": {
"otel": {
Analysis
Missing code block titles in documentation
What fails: Next.js build fails during static page generation for the /docs/reference/configuration page. The MDX renderer requires all code blocks to have title attributes.
How to reproduce:
cd docs/site
pnpm run buildResult before fix:
Error occurred prerendering page "/docs/reference/configuration"
Error: Code blocks must have titles. If you are creating a terminal, use "Terminal" for the title. Else, add a file path name.
Result after fix:
✓ Compiled successfully in 32.7s
[Build completes successfully with all 236 pages generated]
Root cause: Two code blocks in docs/site/content/docs/reference/configuration.mdx were missing the required title attribute:
- Line 789: Code block under
experimentalObservability.otel.headers - Line 812: Code block under
experimentalObservability.otel.resource
Both blocks were changed from ```jsonc to ```jsonc title="./turbo.json" to match the pattern used throughout the documentation.
…the way it first appeared
Coverage Report
|
The @turbo/repository test job was failing on macOS due to OOM during Rust compilation. The combination of CARGO_BUILD_JOBS (default: num CPUs) and -Zthreads=8 (parallel frontend) caused excessive memory usage. Limit to 2 parallel crate compilations on macOS to reduce memory pressure. Co-Authored-By: Claude Opus 4.5 <[email protected]>
macOS ARM runners have limited memory (~7GB). Reduce rustc frontend threads from 8 to 4 for the native library build to prevent OOM. Co-Authored-By: Claude Opus 4.5 <[email protected]>
The previous OOM fix attempts set RUSTFLAGS and CARGO_BUILD_JOBS in the workflow, but turbo's --env-mode=strict was filtering them out because they weren't in globalPassThroughEnv. This adds both variables to globalPassThroughEnv in turbo.json and combines both memory reduction strategies: - CARGO_BUILD_JOBS=2: Limits parallel crate compilation - RUSTFLAGS with -Zthreads=4: Reduces rustc frontend parallelism Co-Authored-By: Claude Opus 4.5 <[email protected]>
GitHub Actions expressions like `${{ condition && 'value' || '' }}`
set empty strings on non-matching platforms, which breaks cargo:
- Empty CARGO_BUILD_JOBS causes "could not parse ''" error
- Empty RUSTFLAGS overrides .cargo/config.toml entirely
Use shell conditionals to only export these vars on macOS, leaving
them unset on other platforms.
Co-Authored-By: Claude Opus 4.5 <[email protected]>
Adding these to the task's `env` array (not just globalPassThroughEnv) ensures they: 1. Are included in the task hash calculation 2. Bust the cached failure from the previous broken run 3. Properly invalidate cache when these values change globalPassThroughEnv passes vars through but doesn't affect the hash. The task-level env array is needed for proper cache invalidation. Co-Authored-By: Claude Opus 4.5 <[email protected]>
| package: task.package.clone(), | ||
| hash: task.shared.hash.clone(), | ||
| external_inputs_hash: task.shared.hash_of_external_dependencies.clone(), | ||
| command: task.shared.command.clone(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a minor risk here that sensitive values could be sent into a collector. Take, for instance, a command like:
turbo run build -- --some-value=$MY_SECRET_TOKEN
Have we checked what this would capture? The shell-injected string or the raw input? I think we need to constrain to the raw input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good callout! I'll investigate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like task.shared.command itself comes from the package.json, pulled from the scripts property. So, it should be the plain string and shouldn't be the interpolated value.
As for -- passthrough arguments, they go through this property instead (task.shared.cli_arguments), and that isn't currently captured in the observability/otel.rs module anywhere yet. I think if they were then this concern would apply - the shell would likely substitute before turbo saw it.
| attrs.push(KeyValue::new("turbo.task.hash", task.hash.clone())); | ||
| attrs.push(KeyValue::new( | ||
| "turbo.task.external_inputs_hash", | ||
| task.external_inputs_hash.clone(), | ||
| )); | ||
| attrs.push(KeyValue::new("turbo.task.command", task.command.clone())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hashes are at an unbounded cardinality - and I'm not sure what analytical value they bring. Aggregations over these would generally be meaningless since there's no meaning in the hash value.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the same problem also exist for turbo.task.id?
turbo.scm.revision seems high cardinality but has a genuine use case of aggregating over a hash, right? turbo.scm.branch would be lower cardinality...But seems like users will want to aggregate over both of these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use resource attributes or examplars meaningfully here? One consideration is that different OTEL backends handle these differently, so the helpfulness is a bit undefined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Sorry, captured the turbo.task.command line in this comment thread but didn't mean to - and now I've got a whole thread with myself going. 😄 Please ignore.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely a good thing to think through. I think yes, revision and branch are useful even with the high cardinality, but maybe a flag to turn them off if desired? The hash and external_inputs_hash and task.id attributes might be too random, though, and should be opt-in instead? 🤔
| // Warn if observability config is present but the feature flag is not enabled | ||
| if let Some(obs_opts) = &self.opts.experimental_observability { | ||
| if obs_opts.otel.is_some() && !self.opts.future_flags.experimental_observability { | ||
| tracing::warn!( | ||
| "experimentalObservability.otel is configured but \ | ||
| futureFlags.experimentalObservability is not enabled in turbo.json. The \ | ||
| observability config will be ignored." | ||
| ); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The futureFlags are meant to be hard gates, not warnings. Easy change!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some thinking around this last year and asked Claude to look at prior futureFlag usages, but I wasn't confident and thought you might call this out. 😄 This warn should be unreachable if the future flag is disabled because of the hard gate here), so the warning only applies when the config comes from CLI/env vars (which typically bypass the turbo.json gate since they're already prefixed with EXPERIMENTAL_), not from turbo.json.
Should I remove the warning anyway to eliminate confusion, or just add a note to the comments indicating when it would actually be used?
| serde_json = { workspace = true, optional = true } | ||
| thiserror = { workspace = true } | ||
| tokio = { workspace = true, features = ["full"] } | ||
| tonic = "0.14" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ultra-nit: This is a different version of tonic than is used in turborepo-lib. Would love to use a consistent version if we can. @anthonyshew, can you bump the tonic version in turborepo-lib forward to accommodate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anthonyshew Were you able to look into turborepo-lib, or should I start a new PR for it (as a prerequisite for this one)?
crates/turborepo-otel/src/lib.rs
Outdated
| }; | ||
|
|
||
| let reader = periodic_reader_with_async_runtime::PeriodicReader::builder(exporter, Tokio) | ||
| .with_interval(Duration::from_secs(15)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is hardcoded - which might be fine. Any reason we should make it configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how often people would actually configure it, but since timeout_ms is already configurable this probably should be also. 👍 I'll update it.
| # macOS ARM runners have limited memory (~7GB). Limit parallel crate | ||
| # compilation and reduce rustc frontend threads to prevent OOM. | ||
| if [ "${{ matrix.os.name }}" == "macos" ]; then | ||
| export CARGO_BUILD_JOBS=2 | ||
| export RUSTFLAGS='--cfg tokio_unstable -Zshare-generics=y -Zthreads=4 -Csymbol-mangling-version=v0' | ||
| fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd love to believe that this change is not required. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, it was consistently failing for me until I iterated to this solution. 😭 I'm glad to remove the workaround and see if you can spot something I missed? 😅
Summary
Adds an experimental OpenTelemetry (OTLP) observability pipeline to Turborepo for exporting run and task metrics to any OTLP-compatible collector.
Key characteristics:
turbo.json(gated behindfutureFlags.experimentalObservability)Motivation
Turborepo already computes rich run summaries and per-task metadata (durations, cache status, SCM info), but these were only available locally via
--dry=jsonor--summarize. This PR enables sending metrics to external collectors for long-term analysis, alerting, and correlation with other telemetry.Architecture
observabilitymoduleRunObservertrait andHandleabstraction for pluggable backendsturborepo-otelcrateopentelemetry+opentelemetry-otlpExperimentalObservabilityOptionsotelfield (room for future backends)Configuration
turbo.json(requires future flag){ "futureFlags": { "experimentalObservability": true }, "experimentalObservability": { "otel": { "enabled": true, "endpoint": "https://collector.example.com", "protocol": "grpc", // or "http/protobuf" "headers": { "X-API-Key": "..." }, "timeoutMs": 10000, "resource": { "service.name": "my-monorepo" }, "metrics": { "runSummary": true, "taskDetails": false }, "useRemoteCacheToken": true // reuse remote cache auth } } }Environment Variables (no future flag required)
TURBO_EXPERIMENTAL_OTEL_ENABLED1/0/true/falseTURBO_EXPERIMENTAL_OTEL_ENDPOINTTURBO_EXPERIMENTAL_OTEL_PROTOCOLgrpcorhttp/protobufTURBO_EXPERIMENTAL_OTEL_TIMEOUT_MSTURBO_EXPERIMENTAL_OTEL_HEADERSkey=valueTURBO_EXPERIMENTAL_OTEL_RESOURCEkey=valueTURBO_EXPERIMENTAL_OTEL_METRICS_RUN_SUMMARYTURBO_EXPERIMENTAL_OTEL_METRICS_TASK_DETAILSTURBO_EXPERIMENTAL_OTEL_USE_REMOTE_CACHE_TOKENCLI Flags (no future flag required)
--experimental-otel-{enabled,endpoint,protocol,timeout-ms,header,resource,metrics-run-summary,metrics-task-details,use-remote-cache-token}Metrics Exported
Run-level (
metrics.runSummary, default:true):turbo.run.id,turbo.version,turbo.scm.*Task-level (
metrics.taskDetails, default:false):turbo.task.{id,name,package,hash,command}useRemoteCacheTokenWhen enabled, automatically adds
Authorization: Bearer <token>using your existing remote cache credentials (turbo loginorTURBO_TOKEN). ExistingAuthorizationheaders are preserved.Run Lifecycle Integration
RunBuilderreadsopts.experimental_observabilityand callsHandle::try_initRunSummaryrecords metrics via the handleshutdown()flushes buffered data before exitFailure Behavior
Compile-time gating: The
otelCargo feature controls inclusion; builds without it treat observability config as a no-op.Quick Start
turbo.json+ future flag, or use env vars/CLI flags directlyuseRemoteCacheTokento reuse cache authrunSummary: true, addtaskDetailswhen needed