Fix NPE in GraphTransaction.doClose() on concurrent graph close (#4899)#4900
Fix NPE in GraphTransaction.doClose() on concurrent graph close (#4899)#4900batrived wants to merge 2 commits into
Conversation
|
|
499e1be to
a5fc7fb
Compare
There was a problem hiding this comment.
Pull request overview
This PR addresses a concurrency race in JanusGraph’s thread-bound transaction cleanup where JanusGraphBlueprintsGraph.close() can null the shared txs ThreadLocal while another thread is executing GraphTransaction.doClose(), previously causing an NPE during commit/rollback cleanup and masking the real transaction outcome.
Changes:
- Make
GraphTransaction.doClose()null-safe by snapshottingtxsto a local variable and checking for null before callingremove(). - Add a regression test that simulates
txsbeing nulled concurrently and assertsdoClose()does not throw.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| janusgraph-core/src/main/java/org/janusgraph/graphdb/tinkerpop/JanusGraphBlueprintsGraph.java | Makes transaction cleanup tolerant of txs being nulled during concurrent graph close. |
| janusgraph-test/src/test/java/org/janusgraph/core/ThreadLocalTxLeakTest.java | Adds regression coverage to ensure doClose() remains null-safe when txs is cleared. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…sGraph#4899) There is a race between JanusGraphBlueprintsGraph.close(), which nulls the shared txs ThreadLocal field, and GraphTransaction.doClose(), which called txs.remove() without a null check. If a graph is closed on one thread while another thread is finishing a commit()/rollback(), the committing thread threw an NPE in its finally cleanup block after the storage commit had already been attempted, masking the real outcome. Snapshot the shared txs reference to a local and null-check it before removing, consistent with the existing null handling in getAutoStartTx() and isOpen(). This preserves the txs = null behavior from JanusGraph#1653/JanusGraph#1341 while making cleanup null-safe. Add a regression test that reproduces the NPE in doClose() when txs has been cleared concurrently. Signed-off-by: Balmukund Trivedi <btrivedipublic@gmail.com> Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
a5fc7fb to
c29729c
Compare
|
@porunov thank you for your time and reviewing it. Since I don't have merge permissions, could you please merge it when you get a chance? Thanks! |
Based on the standard RTC rules we either need a second approval or a week for lazy consensus: https://docs.janusgraph.org/development/#review-then-commit-rtc |
Fixes #4899.
Problem
There is a race between
JanusGraphBlueprintsGraph.close()andGraphTransaction.doClose().close()sets the sharedtxsThreadLocal field tonull, butdoClose()calledtxs.remove()without a null check. If a graph is closed on one thread while another thread is finishing acommit()/rollback(), the committing thread throws an NPE in itsfinallycleanup block — after the storage commit has already been attempted, masking the real outcome.This commonly surfaces during application shutdown / container termination, where in-flight Gremlin requests overlap with graph close. Note that
getAutoStartTx()andGraphTransaction.isOpen()already guard againsttxs == null, butdoClose()did not — the null-handling was inconsistent.Fix
Snapshot the shared
txsreference to a local and null-check it before callingremove():This preserves the
txs = nullbehavior from #1653 (which intentionally drops the strong reference to allow ThreadLocal cleanup, see #1341) while making cleanup null-safe.Testing
Added a regression test
ThreadLocalTxLeakTest#doCloseMustBeNullSafeWhenTxsClearedConcurrentlythat simulates a concurrentclose()nulling the field and assertsdoClose()does not throw. Verified it fails with an NPE without the fix and passes with it. All tests in the class pass.For all changes:
master)?For code changes: