Ensure that transient ThreadContext headers with propagators survive restore by Deepti24 · Pull Request #20854 · opensearch-project/OpenSearch

Deepti24 · 2026-03-12T18:25:29Z

Description

This is to fix issue mentioned in this comment: #20822 (comment)
Issue details

When request reaches this class : Netty4HttpRequestHeaderVerifier, we take a snapshot of context using newStoredContext(). We store it as attribute with key CONTEXT_TO_RESTORE
Now some spans are added to current thread. For instance: AbstractHttpServerTransport => dispatchRequest
Now request lands on SecurityRestFilter. Here current thread's context has span info
SecurityRestFilter would now call restore context stored earlier with key CONTEXT_TO_RESTORE
Since span got added after we took the snapshot, it would not contain the span id.
To fix this we should merge the context in restore method.

Explaining why do we have a SpanReference with null span id as seen in screenshot in above comment (Fix in class ThreadContextBasedTracerContextStorage)

Let's say request comes to NativeMessageHandler=> handleRequest
We start a span with SpanScope spanScope = tracer.withSpanInScope(span). It is auto-closeable
When the block completes, it auto closes using DefaultSpanScope => close
This method sets span id within SpanReference as null
It is never removed from the thread and when threads are re-used, it stays (will attach screen shot on this)

Related Issues

Resolves #[Issue number to be closed when this PR is merged]
(opensearch-project/security#5990)
cwperks#345

Check List

Functionality includes testing.
API changes companion pull request created, if applicable.
Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Deepti24 <chauhan.deepti24@gmail.com>

github-actions · 2026-03-12T18:27:05Z

PR Reviewer Guide 🔍

(Review updated until commit `87e71bf`)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes Sub-PR theme: Fix null SpanReference propagation in ThreadContextBasedTracerContextStorage Relevant files: server/src/main/java/org/opensearch/telemetry/tracing/ThreadContextBasedTracerContextStorage.java server/src/test/java/org/opensearch/telemetry/tracing/ThreadContextBasedTracerContextStorageTests.java Sub-PR theme: Add preserveTransients flag to ThreadContext.newStoredContext to survive restore Relevant files: server/src/main/java/org/opensearch/common/util/concurrent/ThreadContext.java server/src/test/java/org/opensearch/common/util/concurrent/ThreadContextTests.java CHANGELOG.md
⚡ Recommended focus areas for review Double threadLocal.get() In the restore lambda, `threadLocal.get()` is called twice when `preserveResponseHeaders` is true: once to get `current` (line 314) and again inside the `if` branch (line 323). Between these two calls the thread-local could theoretically change. The `current` variable captured at line 314 should be reused in the `putResponseHeaders` call to avoid inconsistency. ThreadContextStruct current = threadLocal.get(); ThreadContextStruct restoredContext = originalContext; if (preserveTransients) { final Map<String, Object> propagated = propagateTransients(current.transientHeaders, current.isSystemContext); if (!propagated.isEmpty()) { restoredContext = originalContext.putTransientIfAbsent(propagated); } } if (preserveResponseHeaders && threadLocal.get() != newContext) { threadLocal.set(restoredContext.putResponseHeaders(threadLocal.get().responseHeaders)); API Compatibility The original `newStoredContext(boolean, Collection<String>)` signature is now delegated to the new three-argument overload with `preserveTransients=false`. This is a silent behavioral change for all existing callers. While the default is `false` (preserving old behavior), the new overload with `preserveTransients=true` is only accessible via the new two-argument boolean signature. Callers who want both `transientHeadersToClear` and `preserveTransients=true` must use the three-argument form, but there is no documentation or Javadoc on the new overloads explaining this. public StoredContext newStoredContext(boolean preserveResponseHeaders, boolean preserveTransients) { return newStoredContext(preserveResponseHeaders, preserveTransients, Collections.emptyList()); } public StoredContext newStoredContext(boolean preserveResponseHeaders, Collection<String> transientHeadersToClear) { return newStoredContext(preserveResponseHeaders, false, transientHeadersToClear); } Test Logic Issue The test `testPropagatedTransientsAreRestored` calls `storedContext.restore()` after the try-with-resources block has already auto-closed the context (which triggers restore via the lambda). This means `restore()` is called twice on the same `StoredContext`, which may produce misleading test results or mask double-restore bugs. try (ThreadContext.StoredContext sc = threadContext.newStoredContext(false, true)) { // now we add something to original thread // Simulate the tracing infrastructure writing CURRENT_SPAN into the stashed context. storedContext = sc; threadContext.putTransient(PROPAGATED_KEY, PROPAGATED_VALUE); } catch (Exception e) { // unlikey to get exception, if we got one, test should fail throw e; } // storedContext would have closed. Now we restore and after that, our original thread should have it storedContext.restore(); // we should be able to find the key now assertEquals(threadContext.getTransient(PROPAGATED_KEY), PROPAGATED_VALUE); putTransientIfAbsent Bypass `putTransientIfAbsent` skips the `putSingleHeader` validation (which checks for duplicate keys and throws on conflict). This means propagated transients can silently overwrite or be silently ignored without any conflict detection that the rest of the codebase relies on. Verify this is intentional and document why the validation is bypassed. private ThreadContextStruct putTransientIfAbsent(Map<String, Object> values) { Map<String, Object> newTransient = new HashMap<>(this.transientHeaders); for (Map.Entry<String, Object> entry : values.entrySet()) { newTransient.putIfAbsent(entry.getKey(), entry.getValue()); } return new ThreadContextStruct(requestHeaders, responseHeaders, newTransient, persistentHeaders, isSystemContext); }

github-actions · 2026-03-12T18:27:33Z

PR Code Suggestions ✨

Latest suggestions up to 87e71bf

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Avoid double restore of stored context The try-with-resources block automatically closes `sc` (calling `restore()`) at the end of the block, so `storedContext.restore()` is called a second time after the block exits. This double-restore will overwrite the thread context again with the original snapshot, likely causing the assertion to fail or producing incorrect behavior. The test should either use a manual close pattern or restructure to avoid the double restore. server/src/test/java/org/opensearch/common/util/concurrent/ThreadContextTests.java [881-892] -ThreadContext.StoredContext storedContext = null; -try (ThreadContext.StoredContext sc = threadContext.newStoredContext(false, true)) { - // now we add something to original thread - // Simulate the tracing infrastructure writing CURRENT_SPAN into the stashed context. - storedContext = sc; - threadContext.putTransient(PROPAGATED_KEY, PROPAGATED_VALUE); -} catch (Exception e) { - // unlikey to get exception, if we got one, test should fail - throw e; -} -// storedContext would have closed. Now we restore and after that, our original thread should have it +ThreadContext.StoredContext storedContext = threadContext.newStoredContext(false, true); +// Simulate the tracing infrastructure writing CURRENT_SPAN into the stashed context. +threadContext.putTransient(PROPAGATED_KEY, PROPAGATED_VALUE); +// Now restore and after that, our original thread should have it storedContext.restore(); Suggestion importance[1-10]: 8 __ Why: The try-with-resources block will automatically call `restore()` on `sc` when the block exits, and then `storedContext.restore()` is called again explicitly. This double-restore is a real bug that would cause the test to behave incorrectly, as the second restore would overwrite the context again with the original snapshot before the assertion.	Medium
General	Use correct context for system context flag during propagation `propagateTransients` is called with `current.isSystemContext`, but the context being restored is `originalContext`, which may have a different `isSystemContext` value. The propagation decision should be based on the context that will actually be restored, not the current context at restore time, to avoid incorrectly propagating transients into a system context. server/src/main/java/org/opensearch/common/util/concurrent/ThreadContext.java [314-321] ThreadContextStruct current = threadLocal.get(); ThreadContextStruct restoredContext = originalContext; if (preserveTransients) { - final Map<String, Object> propagated = propagateTransients(current.transientHeaders, current.isSystemContext); + final Map<String, Object> propagated = propagateTransients(current.transientHeaders, originalContext.isSystemContext); if (!propagated.isEmpty()) { restoredContext = originalContext.putTransientIfAbsent(propagated); } } Suggestion importance[1-10]: 6 __ Why: Using `current.isSystemContext` instead of `originalContext.isSystemContext` could cause transients to be incorrectly propagated into or excluded from a system context during restore. The flag should reflect the destination context (`originalContext`) rather than the current context being replaced.	Low
General	Avoid repeated threadLocal reads in restore lambda Inside the lambda, `threadLocal.get()` is called twice in the `preserveResponseHeaders` branch, but between the condition check and the `putResponseHeaders` call the value could theoretically differ. It is safer to capture `threadLocal.get()` once into a local variable before the conditional to ensure consistency. server/src/main/java/org/opensearch/common/util/concurrent/ThreadContext.java [322-326] -if (preserveResponseHeaders && threadLocal.get() != newContext) { - threadLocal.set(restoredContext.putResponseHeaders(threadLocal.get().responseHeaders)); +ThreadContextStruct currentOnRestore = threadLocal.get(); +if (preserveResponseHeaders && currentOnRestore != newContext) { + threadLocal.set(restoredContext.putResponseHeaders(currentOnRestore.responseHeaders)); } else { threadLocal.set(restoredContext); } Suggestion importance[1-10]: 4 __ Why: While calling `threadLocal.get()` twice in the same branch is slightly inconsistent, in practice this is single-threaded context and the value won't change between the two calls. The suggestion improves code clarity and is a minor defensive improvement.	Low

Previous suggestions

Suggestions up to commit b2d5b5e

Category	Suggestion	Impact
Possible issue	Avoid double-restore in test lifecycle The test uses a try-with-resources block which automatically calls `close()` on `sc` at the end of the block, and then calls `storedContext.restore()` again afterward. This double-restore may cause unintended behavior or mask bugs. The test should either use try-with-resources and assert inside the block, or manually manage the lifecycle without try-with-resources. server/src/test/java/org/opensearch/common/util/concurrent/ThreadContextTests.java [881-892] -ThreadContext.StoredContext storedContext = null; -try (ThreadContext.StoredContext sc = threadContext.newStoredContext(false, true)) { - // now we add something to original thread - // Simulate the tracing infrastructure writing CURRENT_SPAN into the stashed context. - storedContext = sc; - threadContext.putTransient(PROPAGATED_KEY, PROPAGATED_VALUE); -} catch (Exception e) { - // unlikey to get exception, if we got one, test should fail - throw e; -} -// storedContext would have closed. Now we restore and after that, our original thread should have it +ThreadContext.StoredContext storedContext = threadContext.newStoredContext(false, true); +// Simulate the tracing infrastructure writing CURRENT_SPAN into the stashed context. +threadContext.putTransient(PROPAGATED_KEY, PROPAGATED_VALUE); +// Restore and verify propagated transients survive storedContext.restore(); Suggestion importance[1-10]: 6 __ Why: The try-with-resources block automatically calls `close()` on `sc`, and then `storedContext.restore()` is called again afterward, resulting in a double-restore. This could mask bugs or cause unintended behavior in the test. The improved code correctly avoids this by managing the lifecycle manually.	Low
General	Reuse captured context variable consistently The `current` variable is captured at the start of the lambda but `threadLocal.get()` is called again in the `preserveResponseHeaders` condition check. This is inconsistent — `current` should be reused for the identity check to avoid a redundant second read and ensure both checks refer to the same snapshot. server/src/main/java/org/opensearch/common/util/concurrent/ThreadContext.java [314-326] ThreadContextStruct current = threadLocal.get(); ThreadContextStruct restoredContext = originalContext; if (preserveTransients) { final Map<String, Object> propagated = propagateTransients(current.transientHeaders, current.isSystemContext); if (!propagated.isEmpty()) { restoredContext = originalContext.putTransientIfAbsent(propagated); } } -if (preserveResponseHeaders && threadLocal.get() != newContext) { +if (preserveResponseHeaders && current != newContext) { + threadLocal.set(restoredContext.putResponseHeaders(current.responseHeaders)); +} else { + threadLocal.set(restoredContext); +} Suggestion importance[1-10]: 5 __ Why: The suggestion correctly identifies that `current` is captured at the start of the lambda but `threadLocal.get()` is called again redundantly in the condition check. Reusing `current` for both the identity check and `responseHeaders` access improves consistency and avoids a redundant read.	Low
General	Avoid repeated threadLocal reads for consistency Inside the lambda, `threadLocal.get()` is called multiple times, but between the `preserveTransients` block above and this block, the threadLocal has not been modified. However, the condition `threadLocal.get() != newContext` and the subsequent `threadLocal.get().responseHeaders` both call `threadLocal.get()` separately, which could theoretically return different values in a concurrent scenario. Capture `threadLocal.get()` once into a local variable before the conditional to ensure consistency. server/src/main/java/org/opensearch/common/util/concurrent/ThreadContext.java [322-326] -if (preserveResponseHeaders && threadLocal.get() != newContext) { - threadLocal.set(restoredContext.putResponseHeaders(threadLocal.get().responseHeaders)); +ThreadContextStruct currentAtRestore = threadLocal.get(); +if (preserveResponseHeaders && currentAtRestore != newContext) { + threadLocal.set(restoredContext.putResponseHeaders(currentAtRestore.responseHeaders)); } else { threadLocal.set(restoredContext); } Suggestion importance[1-10]: 3 __ Why: While calling `threadLocal.get()` multiple times is slightly inconsistent, `ThreadLocal` is thread-local by design so concurrent modification is not a concern here. This is a minor style/consistency improvement with low practical impact.	Low

Suggestions up to commit d86ab97

Category	Suggestion	Impact
Possible issue	Avoid double-restore after try-with-resources close The `StoredContext` is closed by the try-with-resources block before `storedContext.restore()` is called. Closing a `StoredContext` already calls `restore()` internally, so calling `restore()` again after the block exits is redundant and may produce incorrect test behavior. The test should verify the state immediately after the try-with-resources block exits (which triggers the close/restore), rather than calling `restore()` a second time. server/src/test/java/org/opensearch/common/util/concurrent/ThreadContextTests.java [881-894] -ThreadContext.StoredContext storedContext = null; try (ThreadContext.StoredContext sc = threadContext.newStoredContext(false, true)) { - // now we add something to original thread - // Simulate the tracing infrastructure writing CURRENT_SPAN into the stashed context. - storedContext = sc; threadContext.putTransient(PROPAGATED_KEY, PROPAGATED_VALUE); -} catch (Exception e) { - // unlikey to get exception, if we got one, test should fail - throw e; } -// storedContext would have closed. Now we restore and after that, our original thread should have it -storedContext.restore(); +// After the try-with-resources block, the context is restored; verify the propagated transient is present. +assertEquals(threadContext.getTransient(PROPAGATED_KEY), PROPAGATED_VALUE); Suggestion importance[1-10]: 7 __ Why: The `StoredContext` is closed (and thus `restore()` is called) when the try-with-resources block exits. Calling `storedContext.restore()` again afterward is redundant and could lead to incorrect test behavior, as the state being verified may not reflect the intended scenario. This is a valid correctness concern in the test logic.	Medium
General	Use cached thread-local value consistently in lambda `threadLocal.get()` is called twice inside the lambda after `current` is already assigned from it. The second call to `threadLocal.get()` in the `preserveResponseHeaders` branch is consistent, but the `threadLocal.get() !=` `newContext` comparison uses a freshly fetched value while `current` already holds it. Using `current` consistently avoids a redundant thread-local lookup and makes the intent clearer. server/src/main/java/org/opensearch/common/util/concurrent/ThreadContext.java [314-326] ThreadContextStruct current = threadLocal.get(); ThreadContextStruct restoredContext = originalContext; if (preserveTransients) { final Map<String, Object> propagated = propagateTransients(current.transientHeaders, current.isSystemContext); if (!propagated.isEmpty()) { restoredContext = originalContext.putTransientIfAbsent(propagated); } } -if (preserveResponseHeaders && threadLocal.get() != newContext) { - threadLocal.set(restoredContext.putResponseHeaders(threadLocal.get().responseHeaders)); +if (preserveResponseHeaders && current != newContext) { + threadLocal.set(restoredContext.putResponseHeaders(current.responseHeaders)); } else { threadLocal.set(restoredContext); } Suggestion importance[1-10]: 5 __ Why: The suggestion correctly identifies that `threadLocal.get()` is called multiple times inside the lambda when `current` already holds the value. Using `current` consistently avoids redundant lookups and improves clarity, though the functional impact is minor since the thread-local value shouldn't change between calls in this context.	Low

Suggestions up to commit 458b367

Category	Suggestion	Impact
Possible issue	Avoid double-restore of stored context The try-with-resources block automatically closes `sc` (calling `close()`) at the end of the block, which triggers the restore logic. Then `storedContext.restore()` is called again on the already-closed context, resulting in a double-restore. The test should either use the stored context directly without try-with-resources, or rely solely on the auto-close behavior. server/src/test/java/org/opensearch/common/util/concurrent/ThreadContextTests.java [881-894] -ThreadContext.StoredContext storedContext = null; -try (ThreadContext.StoredContext sc = threadContext.newStoredContext(false, true)) { - // now we add something to original thread - // Simulate the tracing infrastructure writing CURRENT_SPAN into the stashed context. - storedContext = sc; - threadContext.putTransient(PROPAGATED_KEY, PROPAGATED_VALUE); -} catch (Exception e) { - // unlikey to get exception, if we got one, test should fail - throw e; -} -// storedContext would have closed. Now we restore and after that, our original thread should have it +ThreadContext.StoredContext storedContext = threadContext.newStoredContext(false, true); +// Simulate the tracing infrastructure writing CURRENT_SPAN into the stashed context. +threadContext.putTransient(PROPAGATED_KEY, PROPAGATED_VALUE); +// Restore and verify propagated transients survive storedContext.restore(); Suggestion importance[1-10]: 8 __ Why: The try-with-resources block automatically calls `close()` on `sc` at the end of the block, which triggers the restore logic. Then `storedContext.restore()` is called again on the already-closed context, resulting in a double-restore that could produce incorrect test behavior. The improved code correctly avoids this by not using try-with-resources.	Medium
General	Use consistent system context flag during propagation The `propagateTransients` is invoked with `current.isSystemContext` to filter propagated headers, but the result is merged into `originalContext` which may have a different `isSystemContext` value. This could lead to propagated transients being included in a system context where they should be excluded. Ensure the system context flag is consistent between the source used for propagation filtering and the target context being restored. server/src/main/java/org/opensearch/common/util/concurrent/ThreadContext.java [314-322] ThreadContextStruct current = threadLocal.get(); ThreadContextStruct restoredContext = originalContext; if (preserveTransients) { - final Map<String, Object> propagated = propagateTransients(current.transientHeaders, current.isSystemContext); + // Use originalContext.isSystemContext for filtering to stay consistent with the snapshot being restored + final Map<String, Object> propagated = propagateTransients(current.transientHeaders, originalContext.isSystemContext); if (!propagated.isEmpty()) { restoredContext = originalContext.putTransientIfAbsent(propagated); } } -if (preserveResponseHeaders && threadLocal.get() != newContext) { Suggestion importance[1-10]: 5 __ Why: Using `current.isSystemContext` to filter propagated headers but merging into `originalContext` (which may have a different `isSystemContext`) could cause transients to be incorrectly included or excluded. Using `originalContext.isSystemContext` for the propagation filter ensures consistency with the snapshot being restored.	Low
General	Avoid repeated threadLocal reads in restore lambda Inside the lambda, `threadLocal.get()` is called twice when `preserveResponseHeaders` is true, but between the condition check and the `putResponseHeaders` call the value could theoretically change. Capture it once into a local variable to ensure consistency and avoid a potential race condition. server/src/main/java/org/opensearch/common/util/concurrent/ThreadContext.java [322-326] -if (preserveResponseHeaders && threadLocal.get() != newContext) { - threadLocal.set(restoredContext.putResponseHeaders(threadLocal.get().responseHeaders)); +ThreadContextStruct currentAtRestore = threadLocal.get(); +if (preserveResponseHeaders && currentAtRestore != newContext) { + threadLocal.set(restoredContext.putResponseHeaders(currentAtRestore.responseHeaders)); } else { threadLocal.set(restoredContext); } Suggestion importance[1-10]: 4 __ Why: Capturing `threadLocal.get()` once into a local variable ensures consistency between the condition check and the `putResponseHeaders` call. While `ThreadLocal` is thread-local by nature (so no actual race condition exists), it's still a minor improvement for clarity and avoiding redundant reads.	Low

Suggestions up to commit 9a65ec5

Category	Suggestion	Impact
Possible issue	Avoid double-restore of stored context The try-with-resources block automatically closes `sc` (calling `close()`) at the end of the block, which triggers the restore logic. Then `storedContext.restore()` is called again on the already-closed context, resulting in a double-restore. The test should either use the auto-close behavior of try-with-resources OR call `restore()` manually, not both. server/src/test/java/org/opensearch/common/util/concurrent/ThreadContextTests.java [881-894] -ThreadContext.StoredContext storedContext = null; -try (ThreadContext.StoredContext sc = threadContext.newStoredContext(false, true)) { - // now we add something to original thread - // Simulate the tracing infrastructure writing CURRENT_SPAN into the stashed context. - storedContext = sc; - threadContext.putTransient(PROPAGATED_KEY, PROPAGATED_VALUE); -} catch (Exception e) { - // unlikey to get exception, if we got one, test should fail - throw e; -} -// storedContext would have closed. Now we restore and after that, our original thread should have it +ThreadContext.StoredContext storedContext = threadContext.newStoredContext(false, true); +// Simulate the tracing infrastructure writing CURRENT_SPAN into the stashed context. +threadContext.putTransient(PROPAGATED_KEY, PROPAGATED_VALUE); +// Now restore and after that, our original thread should have it storedContext.restore(); Suggestion importance[1-10]: 8 __ Why: The try-with-resources block automatically calls `close()` on `sc` (which triggers restore), and then `storedContext.restore()` is called again on the same context, resulting in a double-restore. This is a real bug in the test that could cause incorrect behavior or misleading test results.	Medium
General	Reuse captured context to avoid inconsistency The `current` variable is captured at the start of the lambda but `threadLocal.get()` is called again later in the condition check. This inconsistency means the `preserveResponseHeaders` check uses a potentially different snapshot than the one used for propagating transients. Reuse the already-captured `current` variable for the condition check. server/src/main/java/org/opensearch/common/util/concurrent/ThreadContext.java [314-326] ThreadContextStruct current = threadLocal.get(); ThreadContextStruct restoredContext = originalContext; if (preserveTransients) { final Map<String, Object> propagated = propagateTransients(current.transientHeaders, current.isSystemContext); if (!propagated.isEmpty()) { restoredContext = originalContext.putTransientIfAbsent(propagated); } } -if (preserveResponseHeaders && threadLocal.get() != newContext) { +if (preserveResponseHeaders && current != newContext) { + threadLocal.set(restoredContext.putResponseHeaders(current.responseHeaders)); +} else { + threadLocal.set(restoredContext); +} Suggestion importance[1-10]: 6 __ Why: The `current` variable is captured at the start of the lambda but `threadLocal.get()` is called again in the condition check, creating an inconsistency. Reusing `current` for both the transient propagation and the `preserveResponseHeaders` condition ensures consistent behavior and avoids redundant calls.	Low
General	Capture thread-local value once for consistency Inside the lambda, `threadLocal.get()` is called multiple times and could theoretically return different values if the thread local is modified concurrently between calls. The result of `threadLocal.get()` should be captured once into a local variable to ensure consistency within the restore operation. server/src/main/java/org/opensearch/common/util/concurrent/ThreadContext.java [322-326] -if (preserveResponseHeaders && threadLocal.get() != newContext) { - threadLocal.set(restoredContext.putResponseHeaders(threadLocal.get().responseHeaders)); +ThreadContextStruct currentAtRestore = threadLocal.get(); +if (preserveResponseHeaders && currentAtRestore != newContext) { + threadLocal.set(restoredContext.putResponseHeaders(currentAtRestore.responseHeaders)); } else { threadLocal.set(restoredContext); } Suggestion importance[1-10]: 4 __ Why: While calling `threadLocal.get()` multiple times within a lambda is generally safe in single-threaded contexts (thread locals are per-thread), capturing it once is cleaner. However, suggestion 3 already addresses this more completely by also reusing `current` for the condition check, making this suggestion partially redundant.	Low

Suggestions up to commit 06588ac

Category	Suggestion	Impact
Possible issue	Avoid double-restore of stored context The try-with-resources block automatically closes `sc` (calling `close()`) at the end of the block, which triggers the restore logic. Then `storedContext.restore()` is called again on the already-closed context, resulting in a double-restore. The test should either use the stored context directly without the try-with-resources auto-close, or rely solely on the auto-close without calling `restore()` again. server/src/test/java/org/opensearch/common/util/concurrent/ThreadContextTests.java [881-894] -ThreadContext.StoredContext storedContext = null; -try (ThreadContext.StoredContext sc = threadContext.newStoredContext(false, true)) { - // now we add something to original thread - // Simulate the tracing infrastructure writing CURRENT_SPAN into the stashed context. - storedContext = sc; - threadContext.putTransient(PROPAGATED_KEY, PROPAGATED_VALUE); -} catch (Exception e) { - // unlikey to get exception, if we got one, test should fail - throw e; -} -// storedContext would have closed. Now we restore and after that, our original thread should have it +ThreadContext.StoredContext storedContext = threadContext.newStoredContext(false, true); +// Simulate the tracing infrastructure writing CURRENT_SPAN into the stashed context. +threadContext.putTransient(PROPAGATED_KEY, PROPAGATED_VALUE); +// Now restore and after that, our original thread should have it storedContext.restore(); Suggestion importance[1-10]: 8 __ Why: The try-with-resources block automatically calls `close()` on `sc` (which triggers restore), and then `storedContext.restore()` is called again on the same context, resulting in a double-restore. This is a real bug in the test that could mask incorrect behavior or cause false positives.	Medium
General	Reuse captured context snapshot consistently The `current` variable is assigned from `threadLocal.get()` at the top of the lambda, but then `threadLocal.get()` is called again in the `preserveResponseHeaders` condition check. This is inconsistent — `current` should be reused in the condition to avoid a redundant second read and ensure both checks operate on the same snapshot. server/src/main/java/org/opensearch/common/util/concurrent/ThreadContext.java [314-326] ThreadContextStruct current = threadLocal.get(); ThreadContextStruct restoredContext = originalContext; if (preserveTransients) { final Map<String, Object> propagated = propagateTransients(current.transientHeaders, current.isSystemContext); if (!propagated.isEmpty()) { restoredContext = originalContext.putTransientIfAbsent(propagated); } } -if (preserveResponseHeaders && threadLocal.get() != newContext) { +if (preserveResponseHeaders && current != newContext) { + threadLocal.set(restoredContext.putResponseHeaders(current.responseHeaders)); +} else { + threadLocal.set(restoredContext); +} Suggestion importance[1-10]: 6 __ Why: The suggestion correctly identifies that `current` is captured at the top of the lambda but `threadLocal.get()` is called again in the condition check, which is inconsistent. Reusing `current` ensures both the propagation logic and the response-headers check operate on the same snapshot, improving correctness and readability.	Low
General	Avoid repeated threadLocal reads in restore lambda Inside the lambda, `threadLocal.get()` is called multiple times but the value could theoretically change between calls in a concurrent scenario. The result of `threadLocal.get()` should be captured once into a local variable before use to ensure consistency and avoid subtle bugs. server/src/main/java/org/opensearch/common/util/concurrent/ThreadContext.java [322-326] -if (preserveResponseHeaders && threadLocal.get() != newContext) { - threadLocal.set(restoredContext.putResponseHeaders(threadLocal.get().responseHeaders)); +ThreadContextStruct currentAtRestore = threadLocal.get(); +if (preserveResponseHeaders && currentAtRestore != newContext) { + threadLocal.set(restoredContext.putResponseHeaders(currentAtRestore.responseHeaders)); } else { threadLocal.set(restoredContext); } Suggestion importance[1-10]: 3 __ Why: While capturing `threadLocal.get()` once is slightly cleaner, `ThreadLocal` is thread-local by definition so concurrent modification isn't a concern here. This is a minor style/consistency improvement with low practical impact.	Low

Deepti24 · 2026-03-12T18:30:13Z

Screenshot showing that SpanReference stays in the thread with null span Added a debugger to a thread from the pool that handles requests. It had SpanReference in its thread local

github-actions · 2026-03-12T18:41:22Z

❌ Gradle check result for b7f9b51: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

server/src/main/java/org/opensearch/common/util/concurrent/ThreadContext.java

cwperks · 2026-03-12T18:51:29Z

@Deepti24 the changes lgtm. left one small comment around naming.

cwperks · 2026-03-12T18:51:37Z

please fix the precommit failures

Signed-off-by: Deepti24 <chauhan.deepti24@gmail.com>

github-actions · 2026-03-12T19:53:32Z

Persistent review updated to latest commit bc5e243

Signed-off-by: Deepti24 <chauhan.deepti24@gmail.com>

github-actions · 2026-03-12T20:03:48Z

Persistent review updated to latest commit edc807d

Signed-off-by: Deepti24 <chauhan.deepti24@gmail.com>

github-actions · 2026-03-12T20:15:59Z

Persistent review updated to latest commit f47cebe

github-actions · 2026-03-12T21:04:01Z

❌ Gradle check result for f47cebe: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

server/src/main/java/org/opensearch/common/util/concurrent/ThreadContext.java

github-actions · 2026-03-12T21:44:47Z

❌ Gradle check result for f47cebe: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Deepti24 <chauhan.deepti24@gmail.com>

github-actions · 2026-03-13T13:11:35Z

Persistent review updated to latest commit b58616f

github-actions · 2026-03-13T13:56:09Z

❌ Gradle check result for b58616f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Test failures

Signed-off-by: Deepti24 <chauhan.deepti24@gmail.com>

github-actions · 2026-03-19T06:44:52Z

Persistent review updated to latest commit 458b367

github-actions · 2026-03-19T07:58:53Z

❌ Gradle check result for 458b367: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Deepti Chauhan <dchauhan3@atlassian.com>

github-actions · 2026-03-19T12:22:35Z

Persistent review updated to latest commit d86ab97

github-actions · 2026-03-19T12:35:52Z

❌ Gradle check result for d86ab97: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

cwperks · 2026-03-19T13:21:14Z

@Deepti24 please fix the spotless errors

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':server:spotlessJavaCheck'.
> The following files had format violations:
      src/test/java/org/opensearch/telemetry/tracing/ThreadContextBasedTracerContextStorageTests.java
          @@ -286,9 +286,6 @@
           ········source·=·new·HashMap<>();
           ········source.put(ThreadContextBasedTracerContextStorage.CURRENT_SPAN,·spanReference);
           ········assertFalse(context.transients(source).isEmpty());
          -········assertEquals(
          -············span,
          -············((SpanReference)·context.transients(source).get(ThreadContextBasedTracerContextStorage.CURRENT_SPAN)).getSpan()
          -········);
          +········assertEquals(span,·((SpanReference)·context.transients(source).get(ThreadContextBasedTracerContextStorage.CURRENT_SPAN)).getSpan());
           ····}
           }
  Run './gradlew spotlessApply' to fix all violations.

Deepti24 · 2026-03-19T13:27:42Z

@cwperks I wanted to fix DCO history (I signed using personal id but was supposed to do with corporate id) along with this, is it okay if I run below commands for the same ?

git reset --soft main
git commit -s -m "Fix security plugin issue and add test cases"
git push origin fix-security-plugin-issue --force

Do let me know if it is not recommended

Have fixed checkstyle

Signed-off-by: Deepti Chauhan <dchauhan3@atlassian.com>

github-actions · 2026-03-19T13:32:13Z

Persistent review updated to latest commit b2d5b5e

cwperks · 2026-03-19T13:38:04Z

@Deepti24 yes you can do that. I also have a mechanism on my side to manually set DCO to pass as long as at least one of the commits is signed. PRs get squashed on merge so a maintainer can assure that the signature is kept in the squashed commit message. I think the requirement is that for all unique contributors on a PR, each contributor needs to sign at least one of their commits.

Deepti24 · 2026-03-19T13:45:14Z

@Deepti24 yes you can do that. I also have a mechanism on my side to manually set DCO to pass as long as at least one of the commits is signed. PRs get squashed on merge so a maintainer can assure that the signature is kept in the squashed commit message. I think the requirement is that for all unique contributors on a PR, each contributor needs to sign at least one of their commits.

@cwperks Sure then please use the DCO corresponding to Signed-off-by: Deepti Chauhan when you merge

ACCOUNT:
dchauhan3-blip

Thanks !!

github-actions · 2026-03-19T14:01:21Z

❌ Gradle check result for b2d5b5e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2026-03-19T15:33:28Z

❌ Gradle check result for b2d5b5e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2026-03-19T17:47:05Z

❌ Gradle check result for b2d5b5e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

cwperks · 2026-03-20T19:53:44Z

@Deepti24 please rebase to ensure that #20920 is in your branch. The tests are failing without that fix.

github-actions · 2026-03-20T20:29:44Z

❌ Gradle check result for b2d5b5e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2026-03-22T02:34:35Z

Persistent review updated to latest commit 87e71bf

github-actions · 2026-03-22T03:49:01Z

✅ Gradle check result for 87e71bf: SUCCESS

opensearch-trigger-bot · 2026-03-22T03:56:38Z

The backport to 2.19 failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.19 2.19
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.19
# Create a new branch
git switch --create backport/backport-20854-to-2.19
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 e39a49606a15262366af703035a52d6ca670627c
# Push it to GitHub
git push --set-upstream origin backport/backport-20854-to-2.19
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.19

Then, create a pull request where the base branch is 2.19 and the compare/head branch is backport/backport-20854-to-2.19.

Fix security plugin issue

b7f9b51

Signed-off-by: Deepti24 <chauhan.deepti24@gmail.com>

Deepti24 requested a review from a team as a code owner March 12, 2026 18:25

Deepti24 mentioned this pull request Mar 12, 2026

[BUG] OS 3.1 not showing node level traces for DBQ query opensearch-project/security#5990

Open

cwperks changed the title ~~Fix security plugin issue~~ Ensure that transient ThreadContext headers with propagators survive stash and restore Mar 12, 2026

Deepti24 changed the title ~~Ensure that transient ThreadContext headers with propagators survive stash and restore~~ Ensure that transient ThreadContext headers with propagators survive restore Mar 12, 2026

cwperks reviewed Mar 12, 2026

View reviewed changes

server/src/main/java/org/opensearch/common/util/concurrent/ThreadContext.java Outdated Show resolved Hide resolved

Fix checkstyle

bc5e243

Signed-off-by: Deepti24 <chauhan.deepti24@gmail.com>

Address review comments

edc807d

Signed-off-by: Deepti24 <chauhan.deepti24@gmail.com>

Update change log

f47cebe

Signed-off-by: Deepti24 <chauhan.deepti24@gmail.com>

cwperks previously approved these changes Mar 12, 2026

View reviewed changes

cwperks reviewed Mar 12, 2026

View reviewed changes

server/src/main/java/org/opensearch/common/util/concurrent/ThreadContext.java Outdated Show resolved Hide resolved

opensearch-ci-bot mentioned this pull request Mar 12, 2026

[AUTOCUT] Gradle Check Flaky Test Report for Gradle Test Run :server:test #19729

Closed

Create method for adding transients

b58616f

Signed-off-by: Deepti24 <chauhan.deepti24@gmail.com>

opensearch-ci-bot mentioned this pull request Mar 14, 2026

[AUTOCUT] Gradle Check Flaky Test Report for Netty4Http3IT #20654

Open

Deepti24 added 2 commits March 14, 2026 18:35

Fix test cases

b1aa4db

Signed-off-by: Deepti24 <chauhan.deepti24@gmail.com>

checkstyle

765bfaf

Signed-off-by: Deepti24 <chauhan.deepti24@gmail.com>

Merge branch 'main' into fix-security-plugin-issue

458b367

Signed-off-by: Deepti24 <chauhan.deepti24@gmail.com>

This was referenced Mar 19, 2026

[AUTOCUT] Gradle Check Flaky Test Report for FullRollingRestartIT #18490

Open

[AUTOCUT] Gradle Check Flaky Test Report for IndexingIT #14302

Open

[AUTOCUT] Gradle Check Flaky Test Report for IndicesRequestCacheIT #14288

Open

Adding test case

d86ab97

Signed-off-by: Deepti Chauhan <dchauhan3@atlassian.com>

Fix checklstyle

b2d5b5e

Signed-off-by: Deepti Chauhan <dchauhan3@atlassian.com>

opensearch-ci-bot mentioned this pull request Mar 19, 2026

[AUTOCUT] Gradle Check Flaky Test Report for WarmIndexSegmentReplicationIT #18157

Open

opensearch-ci-bot mentioned this pull request Mar 19, 2026

[AUTOCUT] Gradle Check Flaky Test Report for RemoteSplitIndexIT #14296

Open

opensearch-ci-bot mentioned this pull request Mar 20, 2026

[AUTOCUT] Gradle Check Flaky Test Report for TelemetryMetricsEnabledSanityIT #19422

Open

Merge branch 'opensearch-project:main' into fix-security-plugin-issue

87e71bf

cwperks approved these changes Mar 22, 2026

View reviewed changes

cwperks merged commit e39a496 into opensearch-project:main Mar 22, 2026
33 checks passed

opensearch-trigger-bot bot added the backport-failed label Mar 22, 2026

Conversation

Deepti24 commented Mar 12, 2026

Description

Related Issues

Check List

Uh oh!

github-actions bot commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit 87e71bf)

Uh oh!

github-actions bot commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

Deepti24 commented Mar 12, 2026

Uh oh!

github-actions bot commented Mar 12, 2026

Uh oh!

Uh oh!

cwperks commented Mar 12, 2026

Uh oh!

cwperks commented Mar 12, 2026

Uh oh!

github-actions bot commented Mar 12, 2026

Uh oh!

github-actions bot commented Mar 12, 2026

Uh oh!

github-actions bot commented Mar 12, 2026

Uh oh!

github-actions bot commented Mar 12, 2026

Uh oh!

Uh oh!

github-actions bot commented Mar 12, 2026

Uh oh!

github-actions bot commented Mar 13, 2026

Uh oh!

github-actions bot commented Mar 13, 2026

Uh oh!

github-actions bot commented Mar 19, 2026

Uh oh!

github-actions bot commented Mar 19, 2026

Uh oh!

github-actions bot commented Mar 19, 2026

Uh oh!

github-actions bot commented Mar 19, 2026

Uh oh!

cwperks commented Mar 19, 2026

Uh oh!

Deepti24 commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 19, 2026

Uh oh!

cwperks commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Deepti24 commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 19, 2026

Uh oh!

github-actions bot commented Mar 19, 2026

Uh oh!

github-actions bot commented Mar 19, 2026

Uh oh!

cwperks commented Mar 20, 2026

Uh oh!

github-actions bot commented Mar 20, 2026

Uh oh!

github-actions bot commented Mar 22, 2026

Uh oh!

github-actions bot commented Mar 22, 2026

Uh oh!

Uh oh!

opensearch-trigger-bot bot commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

github-actions bot commented Mar 12, 2026 •

edited

Loading

(Review updated until commit `87e71bf`)

github-actions bot commented Mar 12, 2026 •

edited

Loading

Deepti24 commented Mar 19, 2026 •

edited

Loading

cwperks commented Mar 19, 2026 •

edited

Loading

Deepti24 commented Mar 19, 2026 •

edited

Loading