Skip to content

Ensure that transient ThreadContext headers with propagators survive restore#20854

Merged
cwperks merged 15 commits intoopensearch-project:mainfrom
Deepti24:fix-security-plugin-issue
Mar 22, 2026
Merged

Ensure that transient ThreadContext headers with propagators survive restore#20854
cwperks merged 15 commits intoopensearch-project:mainfrom
Deepti24:fix-security-plugin-issue

Conversation

@Deepti24
Copy link
Contributor

Description

This is to fix issue mentioned in this comment: #20822 (comment)
Issue details

  • When request reaches this class : Netty4HttpRequestHeaderVerifier, we take a snapshot of context using newStoredContext(). We store it as attribute with key CONTEXT_TO_RESTORE
  • Now some spans are added to current thread. For instance: AbstractHttpServerTransport => dispatchRequest
  • Now request lands on SecurityRestFilter. Here current thread's context has span info
  • SecurityRestFilter would now call restore context stored earlier with key CONTEXT_TO_RESTORE
  • Since span got added after we took the snapshot, it would not contain the span id.
  • To fix this we should merge the context in restore method.

Explaining why do we have a SpanReference with null span id as seen in screenshot in above comment (Fix in class ThreadContextBasedTracerContextStorage)

  • Let's say request comes to NativeMessageHandler=> handleRequest
  • We start a span with SpanScope spanScope = tracer.withSpanInScope(span). It is auto-closeable
  • When the block completes, it auto closes using DefaultSpanScope => close
  • This method sets span id within SpanReference as null
  • It is never removed from the thread and when threads are re-used, it stays (will attach screen shot on this)

Related Issues

Resolves #[Issue number to be closed when this PR is merged]
(opensearch-project/security#5990)
cwperks#345

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Deepti24 <chauhan.deepti24@gmail.com>
@Deepti24 Deepti24 requested a review from a team as a code owner March 12, 2026 18:25
@github-actions
Copy link
Contributor

github-actions bot commented Mar 12, 2026

PR Reviewer Guide 🔍

(Review updated until commit 87e71bf)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: Fix null SpanReference propagation in ThreadContextBasedTracerContextStorage

Relevant files:

  • server/src/main/java/org/opensearch/telemetry/tracing/ThreadContextBasedTracerContextStorage.java
  • server/src/test/java/org/opensearch/telemetry/tracing/ThreadContextBasedTracerContextStorageTests.java

Sub-PR theme: Add preserveTransients flag to ThreadContext.newStoredContext to survive restore

Relevant files:

  • server/src/main/java/org/opensearch/common/util/concurrent/ThreadContext.java
  • server/src/test/java/org/opensearch/common/util/concurrent/ThreadContextTests.java
  • CHANGELOG.md

⚡ Recommended focus areas for review

Double threadLocal.get()

In the restore lambda, threadLocal.get() is called twice when preserveResponseHeaders is true: once to get current (line 314) and again inside the if branch (line 323). Between these two calls the thread-local could theoretically change. The current variable captured at line 314 should be reused in the putResponseHeaders call to avoid inconsistency.

ThreadContextStruct current = threadLocal.get();
ThreadContextStruct restoredContext = originalContext;
if (preserveTransients) {
    final Map<String, Object> propagated = propagateTransients(current.transientHeaders, current.isSystemContext);
    if (!propagated.isEmpty()) {
        restoredContext = originalContext.putTransientIfAbsent(propagated);
    }
}
if (preserveResponseHeaders && threadLocal.get() != newContext) {
    threadLocal.set(restoredContext.putResponseHeaders(threadLocal.get().responseHeaders));
API Compatibility

The original newStoredContext(boolean, Collection<String>) signature is now delegated to the new three-argument overload with preserveTransients=false. This is a silent behavioral change for all existing callers. While the default is false (preserving old behavior), the new overload with preserveTransients=true is only accessible via the new two-argument boolean signature. Callers who want both transientHeadersToClear and preserveTransients=true must use the three-argument form, but there is no documentation or Javadoc on the new overloads explaining this.

public StoredContext newStoredContext(boolean preserveResponseHeaders, boolean preserveTransients) {
    return newStoredContext(preserveResponseHeaders, preserveTransients, Collections.emptyList());
}

public StoredContext newStoredContext(boolean preserveResponseHeaders, Collection<String> transientHeadersToClear) {
    return newStoredContext(preserveResponseHeaders, false, transientHeadersToClear);
}
Test Logic Issue

The test testPropagatedTransientsAreRestored calls storedContext.restore() after the try-with-resources block has already auto-closed the context (which triggers restore via the lambda). This means restore() is called twice on the same StoredContext, which may produce misleading test results or mask double-restore bugs.

try (ThreadContext.StoredContext sc = threadContext.newStoredContext(false, true)) {
    // now we add something to original thread
    // Simulate the tracing infrastructure writing CURRENT_SPAN into the stashed context.
    storedContext = sc;
    threadContext.putTransient(PROPAGATED_KEY, PROPAGATED_VALUE);
} catch (Exception e) {
    // unlikey to get exception, if we got one, test should fail
    throw e;
}
// storedContext would have closed. Now we restore and after that, our original thread should have it
storedContext.restore();
// we should be able to find the key now
assertEquals(threadContext.getTransient(PROPAGATED_KEY), PROPAGATED_VALUE);
putTransientIfAbsent Bypass

putTransientIfAbsent skips the putSingleHeader validation (which checks for duplicate keys and throws on conflict). This means propagated transients can silently overwrite or be silently ignored without any conflict detection that the rest of the codebase relies on. Verify this is intentional and document why the validation is bypassed.

private ThreadContextStruct putTransientIfAbsent(Map<String, Object> values) {
    Map<String, Object> newTransient = new HashMap<>(this.transientHeaders);
    for (Map.Entry<String, Object> entry : values.entrySet()) {
        newTransient.putIfAbsent(entry.getKey(), entry.getValue());
    }
    return new ThreadContextStruct(requestHeaders, responseHeaders, newTransient, persistentHeaders, isSystemContext);
}

@github-actions
Copy link
Contributor

github-actions bot commented Mar 12, 2026

PR Code Suggestions ✨

Latest suggestions up to 87e71bf

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Avoid double restore of stored context

The try-with-resources block automatically closes sc (calling restore()) at the end
of the block, so storedContext.restore() is called a second time after the block
exits. This double-restore will overwrite the thread context again with the original
snapshot, likely causing the assertion to fail or producing incorrect behavior. The
test should either use a manual close pattern or restructure to avoid the double
restore.

server/src/test/java/org/opensearch/common/util/concurrent/ThreadContextTests.java [881-892]

-ThreadContext.StoredContext storedContext = null;
-try (ThreadContext.StoredContext sc = threadContext.newStoredContext(false, true)) {
-    // now we add something to original thread
-    // Simulate the tracing infrastructure writing CURRENT_SPAN into the stashed context.
-    storedContext = sc;
-    threadContext.putTransient(PROPAGATED_KEY, PROPAGATED_VALUE);
-} catch (Exception e) {
-    // unlikey to get exception, if we got one, test should fail
-    throw e;
-}
-// storedContext would have closed. Now we restore and after that, our original thread should have it
+ThreadContext.StoredContext storedContext = threadContext.newStoredContext(false, true);
+// Simulate the tracing infrastructure writing CURRENT_SPAN into the stashed context.
+threadContext.putTransient(PROPAGATED_KEY, PROPAGATED_VALUE);
+// Now restore and after that, our original thread should have it
 storedContext.restore();
Suggestion importance[1-10]: 8

__

Why: The try-with-resources block will automatically call restore() on sc when the block exits, and then storedContext.restore() is called again explicitly. This double-restore is a real bug that would cause the test to behave incorrectly, as the second restore would overwrite the context again with the original snapshot before the assertion.

Medium
General
Use correct context for system context flag during propagation

propagateTransients is called with current.isSystemContext, but the context being
restored is originalContext, which may have a different isSystemContext value. The
propagation decision should be based on the context that will actually be restored,
not the current context at restore time, to avoid incorrectly propagating transients
into a system context.

server/src/main/java/org/opensearch/common/util/concurrent/ThreadContext.java [314-321]

 ThreadContextStruct current = threadLocal.get();
 ThreadContextStruct restoredContext = originalContext;
 if (preserveTransients) {
-    final Map<String, Object> propagated = propagateTransients(current.transientHeaders, current.isSystemContext);
+    final Map<String, Object> propagated = propagateTransients(current.transientHeaders, originalContext.isSystemContext);
     if (!propagated.isEmpty()) {
         restoredContext = originalContext.putTransientIfAbsent(propagated);
     }
 }
Suggestion importance[1-10]: 6

__

Why: Using current.isSystemContext instead of originalContext.isSystemContext could cause transients to be incorrectly propagated into or excluded from a system context during restore. The flag should reflect the destination context (originalContext) rather than the current context being replaced.

Low
Avoid repeated threadLocal reads in restore lambda

Inside the lambda, threadLocal.get() is called twice in the preserveResponseHeaders
branch, but between the condition check and the putResponseHeaders call the value
could theoretically differ. It is safer to capture threadLocal.get() once into a
local variable before the conditional to ensure consistency.

server/src/main/java/org/opensearch/common/util/concurrent/ThreadContext.java [322-326]

-if (preserveResponseHeaders && threadLocal.get() != newContext) {
-    threadLocal.set(restoredContext.putResponseHeaders(threadLocal.get().responseHeaders));
+ThreadContextStruct currentOnRestore = threadLocal.get();
+if (preserveResponseHeaders && currentOnRestore != newContext) {
+    threadLocal.set(restoredContext.putResponseHeaders(currentOnRestore.responseHeaders));
 } else {
     threadLocal.set(restoredContext);
 }
Suggestion importance[1-10]: 4

__

Why: While calling threadLocal.get() twice in the same branch is slightly inconsistent, in practice this is single-threaded context and the value won't change between the two calls. The suggestion improves code clarity and is a minor defensive improvement.

Low

Previous suggestions

Suggestions up to commit b2d5b5e
CategorySuggestion                                                                                                                                    Impact
Possible issue
Avoid double-restore in test lifecycle

The test uses a try-with-resources block which automatically calls close() on sc at
the end of the block, and then calls storedContext.restore() again afterward. This
double-restore may cause unintended behavior or mask bugs. The test should either
use try-with-resources and assert inside the block, or manually manage the lifecycle
without try-with-resources.

server/src/test/java/org/opensearch/common/util/concurrent/ThreadContextTests.java [881-892]

-ThreadContext.StoredContext storedContext = null;
-try (ThreadContext.StoredContext sc = threadContext.newStoredContext(false, true)) {
-    // now we add something to original thread
-    // Simulate the tracing infrastructure writing CURRENT_SPAN into the stashed context.
-    storedContext = sc;
-    threadContext.putTransient(PROPAGATED_KEY, PROPAGATED_VALUE);
-} catch (Exception e) {
-    // unlikey to get exception, if we got one, test should fail
-    throw e;
-}
-// storedContext would have closed. Now we restore and after that, our original thread should have it
+ThreadContext.StoredContext storedContext = threadContext.newStoredContext(false, true);
+// Simulate the tracing infrastructure writing CURRENT_SPAN into the stashed context.
+threadContext.putTransient(PROPAGATED_KEY, PROPAGATED_VALUE);
+// Restore and verify propagated transients survive
 storedContext.restore();
Suggestion importance[1-10]: 6

__

Why: The try-with-resources block automatically calls close() on sc, and then storedContext.restore() is called again afterward, resulting in a double-restore. This could mask bugs or cause unintended behavior in the test. The improved code correctly avoids this by managing the lifecycle manually.

Low
General
Reuse captured context variable consistently

The current variable is captured at the start of the lambda but threadLocal.get() is
called again in the preserveResponseHeaders condition check. This is inconsistent —
current should be reused for the identity check to avoid a redundant second read and
ensure both checks refer to the same snapshot.

server/src/main/java/org/opensearch/common/util/concurrent/ThreadContext.java [314-326]

 ThreadContextStruct current = threadLocal.get();
 ThreadContextStruct restoredContext = originalContext;
 if (preserveTransients) {
     final Map<String, Object> propagated = propagateTransients(current.transientHeaders, current.isSystemContext);
     if (!propagated.isEmpty()) {
         restoredContext = originalContext.putTransientIfAbsent(propagated);
     }
 }
-if (preserveResponseHeaders && threadLocal.get() != newContext) {
+if (preserveResponseHeaders && current != newContext) {
+    threadLocal.set(restoredContext.putResponseHeaders(current.responseHeaders));
+} else {
+    threadLocal.set(restoredContext);
+}
Suggestion importance[1-10]: 5

__

Why: The suggestion correctly identifies that current is captured at the start of the lambda but threadLocal.get() is called again redundantly in the condition check. Reusing current for both the identity check and responseHeaders access improves consistency and avoids a redundant read.

Low
Avoid repeated threadLocal reads for consistency

Inside the lambda, threadLocal.get() is called multiple times, but between the
preserveTransients block above and this block, the threadLocal has not been
modified. However, the condition threadLocal.get() != newContext and the subsequent
threadLocal.get().responseHeaders both call threadLocal.get() separately, which
could theoretically return different values in a concurrent scenario. Capture
threadLocal.get() once into a local variable before the conditional to ensure
consistency.

server/src/main/java/org/opensearch/common/util/concurrent/ThreadContext.java [322-326]

-if (preserveResponseHeaders && threadLocal.get() != newContext) {
-    threadLocal.set(restoredContext.putResponseHeaders(threadLocal.get().responseHeaders));
+ThreadContextStruct currentAtRestore = threadLocal.get();
+if (preserveResponseHeaders && currentAtRestore != newContext) {
+    threadLocal.set(restoredContext.putResponseHeaders(currentAtRestore.responseHeaders));
 } else {
     threadLocal.set(restoredContext);
 }
Suggestion importance[1-10]: 3

__

Why: While calling threadLocal.get() multiple times is slightly inconsistent, ThreadLocal is thread-local by design so concurrent modification is not a concern here. This is a minor style/consistency improvement with low practical impact.

Low
Suggestions up to commit d86ab97
CategorySuggestion                                                                                                                                    Impact
Possible issue
Avoid double-restore after try-with-resources close

The StoredContext is closed by the try-with-resources block before
storedContext.restore() is called. Closing a StoredContext already calls restore()
internally, so calling restore() again after the block exits is redundant and may
produce incorrect test behavior. The test should verify the state immediately after
the try-with-resources block exits (which triggers the close/restore), rather than
calling restore() a second time.

server/src/test/java/org/opensearch/common/util/concurrent/ThreadContextTests.java [881-894]

-ThreadContext.StoredContext storedContext = null;
 try (ThreadContext.StoredContext sc = threadContext.newStoredContext(false, true)) {
-    // now we add something to original thread
-    // Simulate the tracing infrastructure writing CURRENT_SPAN into the stashed context.
-    storedContext = sc;
     threadContext.putTransient(PROPAGATED_KEY, PROPAGATED_VALUE);
-} catch (Exception e) {
-    // unlikey to get exception, if we got one, test should fail
-    throw e;
 }
-// storedContext would have closed. Now we restore and after that, our original thread should have it
-storedContext.restore();
+// After the try-with-resources block, the context is restored; verify the propagated transient is present.
+assertEquals(threadContext.getTransient(PROPAGATED_KEY), PROPAGATED_VALUE);
Suggestion importance[1-10]: 7

__

Why: The StoredContext is closed (and thus restore() is called) when the try-with-resources block exits. Calling storedContext.restore() again afterward is redundant and could lead to incorrect test behavior, as the state being verified may not reflect the intended scenario. This is a valid correctness concern in the test logic.

Medium
General
Use cached thread-local value consistently in lambda

threadLocal.get() is called twice inside the lambda after current is already
assigned from it. The second call to threadLocal.get() in the
preserveResponseHeaders branch is consistent, but the threadLocal.get() !=
newContext comparison uses a freshly fetched value while current already holds it.
Using current consistently avoids a redundant thread-local lookup and makes the
intent clearer.

server/src/main/java/org/opensearch/common/util/concurrent/ThreadContext.java [314-326]

 ThreadContextStruct current = threadLocal.get();
 ThreadContextStruct restoredContext = originalContext;
 if (preserveTransients) {
     final Map<String, Object> propagated = propagateTransients(current.transientHeaders, current.isSystemContext);
     if (!propagated.isEmpty()) {
         restoredContext = originalContext.putTransientIfAbsent(propagated);
     }
 }
-if (preserveResponseHeaders && threadLocal.get() != newContext) {
-    threadLocal.set(restoredContext.putResponseHeaders(threadLocal.get().responseHeaders));
+if (preserveResponseHeaders && current != newContext) {
+    threadLocal.set(restoredContext.putResponseHeaders(current.responseHeaders));
 } else {
     threadLocal.set(restoredContext);
 }
Suggestion importance[1-10]: 5

__

Why: The suggestion correctly identifies that threadLocal.get() is called multiple times inside the lambda when current already holds the value. Using current consistently avoids redundant lookups and improves clarity, though the functional impact is minor since the thread-local value shouldn't change between calls in this context.

Low
Suggestions up to commit 458b367
CategorySuggestion                                                                                                                                    Impact
Possible issue
Avoid double-restore of stored context

The try-with-resources block automatically closes sc (calling close()) at the end of
the block, which triggers the restore logic. Then storedContext.restore() is called
again on the already-closed context, resulting in a double-restore. The test should
either use the stored context directly without try-with-resources, or rely solely on
the auto-close behavior.

server/src/test/java/org/opensearch/common/util/concurrent/ThreadContextTests.java [881-894]

-ThreadContext.StoredContext storedContext = null;
-try (ThreadContext.StoredContext sc = threadContext.newStoredContext(false, true)) {
-    // now we add something to original thread
-    // Simulate the tracing infrastructure writing CURRENT_SPAN into the stashed context.
-    storedContext = sc;
-    threadContext.putTransient(PROPAGATED_KEY, PROPAGATED_VALUE);
-} catch (Exception e) {
-    // unlikey to get exception, if we got one, test should fail
-    throw e;
-}
-// storedContext would have closed. Now we restore and after that, our original thread should have it
+ThreadContext.StoredContext storedContext = threadContext.newStoredContext(false, true);
+// Simulate the tracing infrastructure writing CURRENT_SPAN into the stashed context.
+threadContext.putTransient(PROPAGATED_KEY, PROPAGATED_VALUE);
+// Restore and verify propagated transients survive
 storedContext.restore();
Suggestion importance[1-10]: 8

__

Why: The try-with-resources block automatically calls close() on sc at the end of the block, which triggers the restore logic. Then storedContext.restore() is called again on the already-closed context, resulting in a double-restore that could produce incorrect test behavior. The improved code correctly avoids this by not using try-with-resources.

Medium
General
Use consistent system context flag during propagation

The propagateTransients is invoked with current.isSystemContext to filter propagated
headers, but the result is merged into originalContext which may have a different
isSystemContext value. This could lead to propagated transients being included in a
system context where they should be excluded. Ensure the system context flag is
consistent between the source used for propagation filtering and the target context
being restored.

server/src/main/java/org/opensearch/common/util/concurrent/ThreadContext.java [314-322]

 ThreadContextStruct current = threadLocal.get();
 ThreadContextStruct restoredContext = originalContext;
 if (preserveTransients) {
-    final Map<String, Object> propagated = propagateTransients(current.transientHeaders, current.isSystemContext);
+    // Use originalContext.isSystemContext for filtering to stay consistent with the snapshot being restored
+    final Map<String, Object> propagated = propagateTransients(current.transientHeaders, originalContext.isSystemContext);
     if (!propagated.isEmpty()) {
         restoredContext = originalContext.putTransientIfAbsent(propagated);
     }
 }
-if (preserveResponseHeaders && threadLocal.get() != newContext) {
Suggestion importance[1-10]: 5

__

Why: Using current.isSystemContext to filter propagated headers but merging into originalContext (which may have a different isSystemContext) could cause transients to be incorrectly included or excluded. Using originalContext.isSystemContext for the propagation filter ensures consistency with the snapshot being restored.

Low
Avoid repeated threadLocal reads in restore lambda

Inside the lambda, threadLocal.get() is called twice when preserveResponseHeaders is
true, but between the condition check and the putResponseHeaders call the value
could theoretically change. Capture it once into a local variable to ensure
consistency and avoid a potential race condition.

server/src/main/java/org/opensearch/common/util/concurrent/ThreadContext.java [322-326]

-if (preserveResponseHeaders && threadLocal.get() != newContext) {
-    threadLocal.set(restoredContext.putResponseHeaders(threadLocal.get().responseHeaders));
+ThreadContextStruct currentAtRestore = threadLocal.get();
+if (preserveResponseHeaders && currentAtRestore != newContext) {
+    threadLocal.set(restoredContext.putResponseHeaders(currentAtRestore.responseHeaders));
 } else {
     threadLocal.set(restoredContext);
 }
Suggestion importance[1-10]: 4

__

Why: Capturing threadLocal.get() once into a local variable ensures consistency between the condition check and the putResponseHeaders call. While ThreadLocal is thread-local by nature (so no actual race condition exists), it's still a minor improvement for clarity and avoiding redundant reads.

Low
Suggestions up to commit 9a65ec5
CategorySuggestion                                                                                                                                    Impact
Possible issue
Avoid double-restore of stored context

The try-with-resources block automatically closes sc (calling close()) at the end of
the block, which triggers the restore logic. Then storedContext.restore() is called
again on the already-closed context, resulting in a double-restore. The test should
either use the auto-close behavior of try-with-resources OR call restore() manually,
not both.

server/src/test/java/org/opensearch/common/util/concurrent/ThreadContextTests.java [881-894]

-ThreadContext.StoredContext storedContext = null;
-try (ThreadContext.StoredContext sc = threadContext.newStoredContext(false, true)) {
-    // now we add something to original thread
-    // Simulate the tracing infrastructure writing CURRENT_SPAN into the stashed context.
-    storedContext = sc;
-    threadContext.putTransient(PROPAGATED_KEY, PROPAGATED_VALUE);
-} catch (Exception e) {
-    // unlikey to get exception, if we got one, test should fail
-    throw e;
-}
-// storedContext would have closed. Now we restore and after that, our original thread should have it
+ThreadContext.StoredContext storedContext = threadContext.newStoredContext(false, true);
+// Simulate the tracing infrastructure writing CURRENT_SPAN into the stashed context.
+threadContext.putTransient(PROPAGATED_KEY, PROPAGATED_VALUE);
+// Now restore and after that, our original thread should have it
 storedContext.restore();
Suggestion importance[1-10]: 8

__

Why: The try-with-resources block automatically calls close() on sc (which triggers restore), and then storedContext.restore() is called again on the same context, resulting in a double-restore. This is a real bug in the test that could cause incorrect behavior or misleading test results.

Medium
General
Reuse captured context to avoid inconsistency

The current variable is captured at the start of the lambda but threadLocal.get() is
called again later in the condition check. This inconsistency means the
preserveResponseHeaders check uses a potentially different snapshot than the one
used for propagating transients. Reuse the already-captured current variable for the
condition check.

server/src/main/java/org/opensearch/common/util/concurrent/ThreadContext.java [314-326]

 ThreadContextStruct current = threadLocal.get();
 ThreadContextStruct restoredContext = originalContext;
 if (preserveTransients) {
     final Map<String, Object> propagated = propagateTransients(current.transientHeaders, current.isSystemContext);
     if (!propagated.isEmpty()) {
         restoredContext = originalContext.putTransientIfAbsent(propagated);
     }
 }
-if (preserveResponseHeaders && threadLocal.get() != newContext) {
+if (preserveResponseHeaders && current != newContext) {
+    threadLocal.set(restoredContext.putResponseHeaders(current.responseHeaders));
+} else {
+    threadLocal.set(restoredContext);
+}
Suggestion importance[1-10]: 6

__

Why: The current variable is captured at the start of the lambda but threadLocal.get() is called again in the condition check, creating an inconsistency. Reusing current for both the transient propagation and the preserveResponseHeaders condition ensures consistent behavior and avoids redundant calls.

Low
Capture thread-local value once for consistency

Inside the lambda, threadLocal.get() is called multiple times and could
theoretically return different values if the thread local is modified concurrently
between calls. The result of threadLocal.get() should be captured once into a local
variable to ensure consistency within the restore operation.

server/src/main/java/org/opensearch/common/util/concurrent/ThreadContext.java [322-326]

-if (preserveResponseHeaders && threadLocal.get() != newContext) {
-    threadLocal.set(restoredContext.putResponseHeaders(threadLocal.get().responseHeaders));
+ThreadContextStruct currentAtRestore = threadLocal.get();
+if (preserveResponseHeaders && currentAtRestore != newContext) {
+    threadLocal.set(restoredContext.putResponseHeaders(currentAtRestore.responseHeaders));
 } else {
     threadLocal.set(restoredContext);
 }
Suggestion importance[1-10]: 4

__

Why: While calling threadLocal.get() multiple times within a lambda is generally safe in single-threaded contexts (thread locals are per-thread), capturing it once is cleaner. However, suggestion 3 already addresses this more completely by also reusing current for the condition check, making this suggestion partially redundant.

Low
Suggestions up to commit 06588ac
CategorySuggestion                                                                                                                                    Impact
Possible issue
Avoid double-restore of stored context

The try-with-resources block automatically closes sc (calling close()) at the end of
the block, which triggers the restore logic. Then storedContext.restore() is called
again on the already-closed context, resulting in a double-restore. The test should
either use the stored context directly without the try-with-resources auto-close, or
rely solely on the auto-close without calling restore() again.

server/src/test/java/org/opensearch/common/util/concurrent/ThreadContextTests.java [881-894]

-ThreadContext.StoredContext storedContext = null;
-try (ThreadContext.StoredContext sc = threadContext.newStoredContext(false, true)) {
-    // now we add something to original thread
-    // Simulate the tracing infrastructure writing CURRENT_SPAN into the stashed context.
-    storedContext = sc;
-    threadContext.putTransient(PROPAGATED_KEY, PROPAGATED_VALUE);
-} catch (Exception e) {
-    // unlikey to get exception, if we got one, test should fail
-    throw e;
-}
-// storedContext would have closed. Now we restore and after that, our original thread should have it
+ThreadContext.StoredContext storedContext = threadContext.newStoredContext(false, true);
+// Simulate the tracing infrastructure writing CURRENT_SPAN into the stashed context.
+threadContext.putTransient(PROPAGATED_KEY, PROPAGATED_VALUE);
+// Now restore and after that, our original thread should have it
 storedContext.restore();
Suggestion importance[1-10]: 8

__

Why: The try-with-resources block automatically calls close() on sc (which triggers restore), and then storedContext.restore() is called again on the same context, resulting in a double-restore. This is a real bug in the test that could mask incorrect behavior or cause false positives.

Medium
General
Reuse captured context snapshot consistently

The current variable is assigned from threadLocal.get() at the top of the lambda,
but then threadLocal.get() is called again in the preserveResponseHeaders condition
check. This is inconsistent — current should be reused in the condition to avoid a
redundant second read and ensure both checks operate on the same snapshot.

server/src/main/java/org/opensearch/common/util/concurrent/ThreadContext.java [314-326]

 ThreadContextStruct current = threadLocal.get();
 ThreadContextStruct restoredContext = originalContext;
 if (preserveTransients) {
     final Map<String, Object> propagated = propagateTransients(current.transientHeaders, current.isSystemContext);
     if (!propagated.isEmpty()) {
         restoredContext = originalContext.putTransientIfAbsent(propagated);
     }
 }
-if (preserveResponseHeaders && threadLocal.get() != newContext) {
+if (preserveResponseHeaders && current != newContext) {
+    threadLocal.set(restoredContext.putResponseHeaders(current.responseHeaders));
+} else {
+    threadLocal.set(restoredContext);
+}
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies that current is captured at the top of the lambda but threadLocal.get() is called again in the condition check, which is inconsistent. Reusing current ensures both the propagation logic and the response-headers check operate on the same snapshot, improving correctness and readability.

Low
Avoid repeated threadLocal reads in restore lambda

Inside the lambda, threadLocal.get() is called multiple times but the value could
theoretically change between calls in a concurrent scenario. The result of
threadLocal.get() should be captured once into a local variable before use to ensure
consistency and avoid subtle bugs.

server/src/main/java/org/opensearch/common/util/concurrent/ThreadContext.java [322-326]

-if (preserveResponseHeaders && threadLocal.get() != newContext) {
-    threadLocal.set(restoredContext.putResponseHeaders(threadLocal.get().responseHeaders));
+ThreadContextStruct currentAtRestore = threadLocal.get();
+if (preserveResponseHeaders && currentAtRestore != newContext) {
+    threadLocal.set(restoredContext.putResponseHeaders(currentAtRestore.responseHeaders));
 } else {
     threadLocal.set(restoredContext);
 }
Suggestion importance[1-10]: 3

__

Why: While capturing threadLocal.get() once is slightly cleaner, ThreadLocal is thread-local by definition so concurrent modification isn't a concern here. This is a minor style/consistency improvement with low practical impact.

Low

@Deepti24
Copy link
Contributor Author

image Screenshot showing that SpanReference stays in the thread with null span Added a debugger to a thread from the pool that handles requests. It had SpanReference in its thread local

@github-actions
Copy link
Contributor

❌ Gradle check result for b7f9b51: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@cwperks cwperks changed the title Fix security plugin issue Ensure that transient ThreadContext headers with propagators survive stash and restore Mar 12, 2026
@Deepti24 Deepti24 changed the title Ensure that transient ThreadContext headers with propagators survive stash and restore Ensure that transient ThreadContext headers with propagators survive restore Mar 12, 2026
@cwperks
Copy link
Member

cwperks commented Mar 12, 2026

@Deepti24 the changes lgtm. left one small comment around naming.

@cwperks
Copy link
Member

cwperks commented Mar 12, 2026

please fix the precommit failures

Signed-off-by: Deepti24 <chauhan.deepti24@gmail.com>
@github-actions
Copy link
Contributor

Persistent review updated to latest commit bc5e243

Signed-off-by: Deepti24 <chauhan.deepti24@gmail.com>
@github-actions
Copy link
Contributor

Persistent review updated to latest commit edc807d

Signed-off-by: Deepti24 <chauhan.deepti24@gmail.com>
@github-actions
Copy link
Contributor

Persistent review updated to latest commit f47cebe

cwperks
cwperks previously approved these changes Mar 12, 2026
@github-actions
Copy link
Contributor

❌ Gradle check result for f47cebe: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for f47cebe: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Deepti24 <chauhan.deepti24@gmail.com>
@github-actions
Copy link
Contributor

Persistent review updated to latest commit b58616f

@github-actions
Copy link
Contributor

❌ Gradle check result for b58616f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Deepti24 <chauhan.deepti24@gmail.com>
Signed-off-by: Deepti24 <chauhan.deepti24@gmail.com>
Signed-off-by: Deepti24 <chauhan.deepti24@gmail.com>
@github-actions
Copy link
Contributor

Persistent review updated to latest commit 458b367

@github-actions
Copy link
Contributor

❌ Gradle check result for 458b367: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Deepti Chauhan <dchauhan3@atlassian.com>
@github-actions
Copy link
Contributor

Persistent review updated to latest commit d86ab97

@github-actions
Copy link
Contributor

❌ Gradle check result for d86ab97: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@cwperks
Copy link
Member

cwperks commented Mar 19, 2026

@Deepti24 please fix the spotless errors

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':server:spotlessJavaCheck'.
> The following files had format violations:
      src/test/java/org/opensearch/telemetry/tracing/ThreadContextBasedTracerContextStorageTests.java
          @@ -286,9 +286,6 @@
           ········source·=·new·HashMap<>();
           ········source.put(ThreadContextBasedTracerContextStorage.CURRENT_SPAN,·spanReference);
           ········assertFalse(context.transients(source).isEmpty());
          -········assertEquals(
          -············span,
          -············((SpanReference)·context.transients(source).get(ThreadContextBasedTracerContextStorage.CURRENT_SPAN)).getSpan()
          -········);
          +········assertEquals(span,·((SpanReference)·context.transients(source).get(ThreadContextBasedTracerContextStorage.CURRENT_SPAN)).getSpan());
           ····}
           }
  Run './gradlew spotlessApply' to fix all violations.

@Deepti24
Copy link
Contributor Author

Deepti24 commented Mar 19, 2026

@cwperks I wanted to fix DCO history (I signed using personal id but was supposed to do with corporate id) along with this, is it okay if I run below commands for the same ?

git reset --soft main
git commit -s -m "Fix security plugin issue and add test cases"
git push origin fix-security-plugin-issue --force

Do let me know if it is not recommended

Have fixed checkstyle

Signed-off-by: Deepti Chauhan <dchauhan3@atlassian.com>
@github-actions
Copy link
Contributor

Persistent review updated to latest commit b2d5b5e

@cwperks
Copy link
Member

cwperks commented Mar 19, 2026

@Deepti24 yes you can do that. I also have a mechanism on my side to manually set DCO to pass as long as at least one of the commits is signed. PRs get squashed on merge so a maintainer can assure that the signature is kept in the squashed commit message. I think the requirement is that for all unique contributors on a PR, each contributor needs to sign at least one of their commits.

@Deepti24
Copy link
Contributor Author

Deepti24 commented Mar 19, 2026

@Deepti24 yes you can do that. I also have a mechanism on my side to manually set DCO to pass as long as at least one of the commits is signed. PRs get squashed on merge so a maintainer can assure that the signature is kept in the squashed commit message. I think the requirement is that for all unique contributors on a PR, each contributor needs to sign at least one of their commits.

@cwperks Sure then please use the DCO corresponding to Signed-off-by: Deepti Chauhan when you merge

ACCOUNT:
dchauhan3-blip

Thanks !!

@github-actions
Copy link
Contributor

❌ Gradle check result for b2d5b5e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for b2d5b5e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for b2d5b5e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@cwperks
Copy link
Member

cwperks commented Mar 20, 2026

@Deepti24 please rebase to ensure that #20920 is in your branch. The tests are failing without that fix.

@github-actions
Copy link
Contributor

❌ Gradle check result for b2d5b5e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

Persistent review updated to latest commit 87e71bf

@github-actions
Copy link
Contributor

✅ Gradle check result for 87e71bf: SUCCESS

@cwperks cwperks merged commit e39a496 into opensearch-project:main Mar 22, 2026
33 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.19 failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.19 2.19
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.19
# Create a new branch
git switch --create backport/backport-20854-to-2.19
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 e39a49606a15262366af703035a52d6ca670627c
# Push it to GitHub
git push --set-upstream origin backport/backport-20854-to-2.19
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.19

Then, create a pull request where the base branch is 2.19 and the compare/head branch is backport/backport-20854-to-2.19.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 2.19 backport-failed skip-diff-analyzer Maintainer to skip code-diff-analyzer check, after reviewing issues in AI analysis.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants