forked from apache/cassandra
-
Notifications
You must be signed in to change notification settings - Fork 21
Fix ColumnFamilyStore.getIfExists
handling of dropped keyspaces
#1738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
pcmanus
wants to merge
1
commit into
main
Choose a base branch
from
getIfExists-and-dropped-keyspaces
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Checklist before you submit for review
|
|
68cb0d8
to
4912cf7
Compare
pkolaczk
approved these changes
Jun 18, 2025
The `ColumnFamilyStore.getIfExists` method whole point is explained in its javadoc, which says: ``` Differently from others, this method does not throw exception if the table does not exist ``` That method work as expected if the table does not exists _but_ the keyspace does. But if the keyspace does not exists (in which case the table does not exists by extension), then it unexpectedly throws `UnknownKeyspaceException` instead of returning `null`. That is clearly unexpected as the code has the following snippet: ```java Keyspace keyspace = Keyspace.open(metadata.keyspace); if (keyspace == null) return null; ``` but this does not work as `Keyspace#open` never return `null`, it throws instead. This problem is years old, so it is obviously not a big issue. But at least in the context of CNDB, it can lead to unexpected errors in the log, and that is the source of some flaky tests. For instance, a recent run had a test failing with the following in the log: ``` [writer-0] DEBUG [grpc-default-executor-1] 2025-05-14 08:34:48,641 Schema.java:762 - Instance removed for keyspace testreadfromstorageservicewithpreparedidcollision [writer-0] DEBUG [grpc-default-executor-1] 2025-05-14 08:34:48,641 Schema.java:765 - Awaiting on write barrier before dropping keyspace testreadfromstorageservicewithpreparedidcollision [writer-0] DEBUG [StoragePeriodicReload:1] 2025-05-14 08:34:48,641 Keyspace.java:167 - New instance created for keyspace testreadfromstorageservicewithpreparedidcollision [writer-0] ERROR [StoragePeriodicReload:1] 2025-05-14 08:34:48,642 RemoteStorageHandler.java:339 - Failed periodic reload of sstables for testreadfromstorageservicewithpreparedidcollision.default_table with reason SSTABLES_CHANGED and attempts 0, rescheduling: [writer-0] org.apache.cassandra.exceptions.UnknownKeyspaceException: Could not find a keyspace testreadfromstorageservicewithpreparedidcollision [writer-0] at org.apache.cassandra.db.Keyspace.<init>(Keyspace.java:368) [writer-0] at org.apache.cassandra.db.Keyspace.lambda$open$0(Keyspace.java:168) [writer-0] at org.apache.cassandra.utils.concurrent.LoadingMap.lambda$blockingLoadIfAbsent$1(LoadingMap.java:273) [writer-0] at org.apache.cassandra.utils.concurrent.LoadingMap.lambda$computeIfAbsent$0(LoadingMap.java:98) [writer-0] at org.apache.cassandra.utils.concurrent.LoadingMap.updateOrRemoveEntry(LoadingMap.java:178) [writer-0] at org.apache.cassandra.utils.concurrent.LoadingMap.computeIfAbsent(LoadingMap.java:98) [writer-0] at org.apache.cassandra.utils.concurrent.LoadingMap.blockingLoadIfAbsent(LoadingMap.java:272) [writer-0] at org.apache.cassandra.schema.Schema.maybeAddKeyspaceInstance(Schema.java:246) [writer-0] at org.apache.cassandra.db.Keyspace.open(Keyspace.java:166) [writer-0] at org.apache.cassandra.db.Keyspace.open(Keyspace.java:155) [writer-0] at org.apache.cassandra.db.ColumnFamilyStore.getIfExists(ColumnFamilyStore.java:3406) [writer-0] at com.datastax.cndb.metadata.storage.RemoteStorageHandler.maybeMarkIndexes(RemoteStorageHandler.java:1594) [writer-0] at com.datastax.cndb.metadata.storage.RemoteStorageHandler.doReloadSSTables(RemoteStorageHandler.java:1229) [writer-0] at com.datastax.cndb.metadata.storage.RemoteStorageHandler$Reloader.reload(RemoteStorageHandler.java:315) [writer-0] at com.datastax.cndb.metadata.storage.RemoteStorageHandler$Reloader.lambda$maybeSchedule$5(RemoteStorageHandler.java:406) [writer-0] at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572) [writer-0] at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317) [writer-0] at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [writer-0] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [writer-0] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [writer-0] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [writer-0] at java.base/java.lang.Thread.run(Thread.java:1570) ``` Here what happens is that the test keyspace is dropped by a `@AfterEach`, but this races with the periodic reloading of sstable which uses this `getIfExists` method. This patch fixes `ColumnFamilyStore.getIfExists` to return `null` if the keyspace does not exists. I will note that I reviewed all the usages of `ColumnFamilyStore.getIfExists` in CNDB. They are all of the form: ``` ColumnFamilyStore cfs = ColumnFamilyStore.getIfExists(...); if (cfs == null) return; ``` Meaning that this is used to not do some task if the table doesn't exists anymore. And so I'm confident this is the right change for CNDB. For the usages within C*, there is a number of uses cases where the code does: ``` ColumnFamilyStore cfs = ColumnFamilyStore.getIfExists(...); if (cfs == null) throw SomeException(...); ``` For such cases, this patch has the consequence that where previously a `UnknownKeyspaceException` was throw, now a `SomeException` will be throw instance. I assume this is fine. But in theory, some existing code could have been written to deal with `UnknownKeyspaceException` differently from `SomeException` somehow, and it would be really hard to follow every possible codepath upwards. Even if this were true, I can't imagine this have much consequences, so I'm reasonably confident with ignoring this risk, but wanted to mention it for completness.
4912cf7
to
592f631
Compare
|
❌ Build ds-cassandra-pr-gate/PR-1738 rejected by Butler1 new test failure(s) in 3 builds Found 1 new test failures
Found 4 known test failures |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The
ColumnFamilyStore.getIfExists
method whole point is explained in its javadoc, which says:That method work as expected if the table does not exists but the keyspace does. But if the keyspace does not exists (in which case the table does not exists by extension), then it unexpectedly throws
UnknownKeyspaceException
instead of returningnull
.That is clearly unexpected as the code has the following snippet:
but this does not work as
Keyspace#open
never returnnull
, it throws instead.This problem is years old, so it is obviously not a big issue. But at least in the context of CNDB, it can lead to unexpected errors in the log, and that is the source of some flaky tests. For instance, a recent run had a test failing with the following in the log:
Here what happens is that the test keyspace is dropped by a
@AfterEach
, but this races with the periodic reloading of sstable which uses thisgetIfExists
method.This patch fixes
ColumnFamilyStore.getIfExists
to returnnull
if the keyspace does not exists.I will note that I reviewed all the usages of
ColumnFamilyStore.getIfExists
in CNDB. They are all of the form:Meaning that this is used to not do some task if the table doesn't exists anymore. And so I'm confident this is the right change for CNDB.
For the usages within C*, there is a number of uses cases where the code does:
For such cases, this patch has the consequence that where previously a
UnknownKeyspaceException
was throw, now aSomeException
will be throw instance. I assume this is fine. But in theory, some existing code could have been written to deal withUnknownKeyspaceException
differently fromSomeException
somehow, and it would be really hard to follow every possible codepath upwards. Even if this were true, I can't imagine this have much consequences, so I'm reasonably confident with ignoring this risk, but wanted to mention it for completness.