Skip to content

lib,src,test: fix race during tracing toggles#441

Open
santigimeno wants to merge 2 commits intonode-v24.x-nsolid-v6.xfrom
santi/tracing_crash_enable_disable
Open

lib,src,test: fix race during tracing toggles#441
santigimeno wants to merge 2 commits intonode-v24.x-nsolid-v6.xfrom
santi/tracing_crash_enable_disable

Conversation

@santigimeno
Copy link
Copy Markdown
Member

@santigimeno santigimeno commented Mar 19, 2026

Keep JS tracing decisions tied to the env-local tracing state materialized by the native toggle path instead of rereading live flags in multiple places.

Tracing flags can be updated from any thread, so reading them directly during span creation can observe a mid-flight toggle. That could make Tracer.startSpan() return a NonRecordingSpan after the caller had already passed an internal tracing gate, leading to crashes like TypeError: span._pushSpanDataString is not a function.

Pass the full tracing bitmask from EnvList::update_tracing_flags() into JS, cache it in lib/internal/nsolid_trace.js, and make the internal OTel code consume that cached per-thread state. This removes the split-brain behavior where callback binding/unbinding followed flagsUpdated but Tracer.startSpan() could independently observe a later disable and return a NonRecordingSpan.

For internal spans, stop Tracer.startSpan() from rechecking the current trace flags after the caller has already crossed an internal tracing gate. That keeps internal span creation locally consistent while tracing is being enabled and disabled and avoids crashes in code paths that expect a real N|Solid span object.

Also add targeted repro coverage for the toggle race:

  • extend the tracing test addon with configurable setupTracing(flags), stopTracing(), and skipExpectedTracesCheck()
  • add a deterministic fetch-based repro that toggles HTTP client tracing while concurrent fetch() traffic is in flight
  • add a grpc-based repro harness for tracing reconfiguration and cover both fetch and http traffic
  • teach the grpc agent test client how to generate fetch transactions

Together these changes make trace flags materialize per env/thread in JS, preserve the caller's local tracing decision for internal spans, and add regression coverage for the tracing enable/disable race.

Summary by CodeRabbit

  • Refactor

    • Improved trace-flag caching so runtime components receive updated flags when tracing is toggled.
    • Adjusted span-starting behavior so internal spans bypass global trace-disable gating.
  • Tests

    • Added race-condition tests covering concurrent HTTP fetch and gRPC tracing toggles.
    • Enhanced test utilities for controlled tracing start/stop and simulated fetch workloads.
  • Bug Fix

    • Fixed tracing toggle signaling so listeners receive the updated flag payload.

@santigimeno santigimeno requested a review from RafaelGSS March 19, 2026 15:38
@santigimeno santigimeno self-assigned this Mar 19, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 19, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b7fa7726-5deb-409e-84c5-ed0cf7df42e7

📥 Commits

Reviewing files that changed from the base of the PR and between c6ab0ff and dbd0174.

📒 Files selected for processing (9)
  • lib/internal/nsolid_trace.js
  • lib/internal/otel/core.js
  • lib/internal/otel/trace.js
  • src/nsolid/nsolid_api.cc
  • test/addons/nsolid-tracing/binding.cc
  • test/addons/nsolid-tracing/test-otel-fetch-enable-disable-race.js
  • test/agents/test-grpc-tracing-race.mjs
  • test/common/nsolid-grpc-agent/client.js
  • test/parallel/test-nsolid-file-handle-count.js
✅ Files skipped from review due to trivial changes (1)
  • lib/internal/otel/core.js
🚧 Files skipped from review as they are similar to previous changes (5)
  • test/parallel/test-nsolid-file-handle-count.js
  • lib/internal/otel/trace.js
  • lib/internal/nsolid_trace.js
  • test/addons/nsolid-tracing/test-otel-fetch-enable-disable-race.js
  • test/addons/nsolid-tracing/binding.cc

Walkthrough

Centralizes trace flags into a cached currentTraceFlags in lib/internal/nsolid_trace.js, exposes getTraceFlags(), changes the native binding toggle callback to pass a uint32 flags value, and updates OTEL/tracer code and tests to use the cached flags and flags-updated payload.

Changes

Cohort / File(s) Summary
Trace Flags Cache
lib/internal/nsolid_trace.js
Introduce cached currentTraceFlags, add getTraceFlags() export, derive span bitmask from cached flags, update toggle callback to accept flags and emit flagsUpdated(flags).
OTEL Integration
lib/internal/otel/core.js, lib/internal/otel/trace.js
Replace direct reads of binding flags with getTraceFlags(); update flagsUpdated handlers to accept flags and enable/disable API/instrumentation based on it; adjust span creation gating to respect options.internal.
Native Binding Change
src/nsolid/nsolid_api.cc
Pass a Uint32 (flags) into env->nsolid_toggle_tracing_fn() instead of a boolean.
Test Binding Infrastructure
test/addons/nsolid-tracing/binding.cc
Allow optional uint32 flags in setupTracing, avoid duplicate init, register atexit once, add stopTracing() and skipExpectedTracesCheck(), conditionally skip exit-time trace checks.
Race Tests (HTTP/fetch & gRPC)
test/addons/nsolid-tracing/test-otel-fetch-enable-disable-race.js, test/agents/test-grpc-tracing-race.mjs
New tests that toggle tracing on/off while exercising concurrent fetch/HTTP or gRPC tracing to surface race conditions.
gRPC Agent: Fetch Path
test/common/nsolid-grpc-agent/client.js
Add execFetchTransaction() and handle msg.kind === 'fetch' to run many concurrent fetch POST requests for tracing tests.
Minor Test Adjustment
test/parallel/test-nsolid-file-handle-count.js
Wrap resolved-value handler with explicit lambda so closePromiseFd receives the resolved file handle argument.

Sequence Diagram

sequenceDiagram
    participant NB as NativeBinding
    participant NT as nsolid_trace
    participant OC as OTEL_Core
    participant TR as Tracer

    NB->>NT: toggleTracingFn(flags: uint32)
    activate NT
    NT->>NT: currentTraceFlags = flags
    NT->>OC: emit flagsUpdated(flags)
    deactivate NT

    activate OC
    alt flags > 0
        OC->>OC: enable API / instrumentation
    else flags == 0
        OC->>OC: disable API / instrumentation
    end
    deactivate OC

    TR->>NT: getTraceFlags()
    activate NT
    NT-->>TR: currentTraceFlags
    deactivate NT

    alt !options.internal && flags == 0
        TR->>TR: return invalid span context
    else
        TR->>TR: create span using flags bitmask
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Suggested reviewers

  • RafaelGSS

Poem

🐰 I thumped a log and cached the light,

Flags now travel with numbers tight.
Native taps shout, I store and cheer,
OTEL wakes or dozes clear.
Fetch and gRPC hop—no race too near. 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and clearly summarizes the main change: fixing a race condition during tracing toggles across the lib, src, and test directories.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch santi/tracing_crash_enable_disable

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/common/nsolid-grpc-agent/client.js`:
- Around line 127-129: The CLI top-level trace handling is missing support for
the "fetch" trace, so invoking the helper with args.values.trace === 'fetch'
no-ops; update the CLI flow that checks args.values.trace (the direct path
starting around where args.values.trace is evaluated) to handle 'fetch' by
invoking the same logic as handleTrace() does for fetch (i.e., call
execFetchTransaction() or delegate to handleTrace('fetch')), ensuring the code
paths that currently only handle other trace values also include a branch for
'fetch' so -t fetch triggers the expected behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f5e7f60a-979e-47db-a74a-f6aa8b1ec80b

📥 Commits

Reviewing files that changed from the base of the PR and between d41bff4 and 03ccece.

📒 Files selected for processing (8)
  • lib/internal/nsolid_trace.js
  • lib/internal/otel/core.js
  • lib/internal/otel/trace.js
  • src/nsolid/nsolid_api.cc
  • test/addons/nsolid-tracing/binding.cc
  • test/addons/nsolid-tracing/test-otel-fetch-enable-disable-race.js
  • test/agents/test-grpc-tracing-race.mjs
  • test/common/nsolid-grpc-agent/client.js

@santigimeno santigimeno force-pushed the santi/tracing_crash_enable_disable branch from 03ccece to c6ab0ff Compare March 19, 2026 16:26
coderabbitai[bot]
coderabbitai bot previously approved these changes Mar 19, 2026
@santigimeno santigimeno requested a review from EHortua March 23, 2026 14:37
Keep JS tracing decisions tied to the env-local tracing state
materialized by the native toggle path instead of rereading live flags
in multiple places.

Tracing flags can be updated from any thread, so reading them directly
during span creation can observe a mid-flight toggle. That could make
Tracer.startSpan() return a NonRecordingSpan after the caller had
already passed an internal tracing gate, leading to crashes like
TypeError: span._pushSpanDataString is not a function.

Pass the full tracing bitmask from EnvList::update_tracing_flags() into
JS, cache it in lib/internal/nsolid_trace.js, and make the internal
OTel code consume that cached per-thread state. This removes the
split-brain behavior where callback binding/unbinding followed
flagsUpdated but Tracer.startSpan() could independently observe a later
disable and return a NonRecordingSpan.

For internal spans, stop Tracer.startSpan() from rechecking the current
trace flags after the caller has already crossed an internal tracing
gate. That keeps internal span creation locally consistent while
tracing is being enabled and disabled and avoids crashes in code paths
that expect a real N|Solid span object.

Also add targeted repro coverage for the toggle race:
- extend the tracing test addon with configurable setupTracing(flags),
  stopTracing(), and skipExpectedTracesCheck()
- add a deterministic fetch-based repro that toggles HTTP client
  tracing while concurrent fetch() traffic is in flight
- add a grpc-based repro harness for tracing reconfiguration and cover
  both fetch and http traffic
- teach the grpc agent test client how to generate fetch transactions

Together these changes make trace flags materialize per env/thread in
JS, preserve the caller's local tracing decision for internal spans,
and add regression coverage for the tracing enable/disable race.
@santigimeno santigimeno force-pushed the santi/tracing_crash_enable_disable branch from dbd0174 to 8b77456 Compare March 31, 2026 09:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants