Skip to content

Fix ColumnFamilyStore.getIfExists handling of dropped keyspaces #1738

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

pcmanus
Copy link

@pcmanus pcmanus commented May 16, 2025

The ColumnFamilyStore.getIfExists method whole point is explained in its javadoc, which says:

Differently from others, this method does not throw exception if the table does not exist

That method work as expected if the table does not exists but the keyspace does. But if the keyspace does not exists (in which case the table does not exists by extension), then it unexpectedly throws UnknownKeyspaceException instead of returning null.

That is clearly unexpected as the code has the following snippet:

Keyspace keyspace = Keyspace.open(metadata.keyspace);
if (keyspace == null)
    return null;

but this does not work as Keyspace#open never return null, it throws instead.

This problem is years old, so it is obviously not a big issue. But at least in the context of CNDB, it can lead to unexpected errors in the log, and that is the source of some flaky tests. For instance, a recent run had a test failing with the following in the log:

[writer-0] DEBUG [grpc-default-executor-1] 2025-05-14 08:34:48,641 Schema.java:762 - Instance removed for keyspace testreadfromstorageservicewithpreparedidcollision
[writer-0] DEBUG [grpc-default-executor-1] 2025-05-14 08:34:48,641 Schema.java:765 - Awaiting on write barrier before dropping keyspace testreadfromstorageservicewithpreparedidcollision
[writer-0] DEBUG [StoragePeriodicReload:1] 2025-05-14 08:34:48,641 Keyspace.java:167 - New instance created for keyspace testreadfromstorageservicewithpreparedidcollision
[writer-0] ERROR [StoragePeriodicReload:1] 2025-05-14 08:34:48,642 RemoteStorageHandler.java:339 - Failed periodic reload of sstables for testreadfromstorageservicewithpreparedidcollision.default_table with reason SSTABLES_CHANGED and attempts 0, rescheduling:
[writer-0] org.apache.cassandra.exceptions.UnknownKeyspaceException: Could not find a keyspace testreadfromstorageservicewithpreparedidcollision
[writer-0] 	at org.apache.cassandra.db.Keyspace.<init>(Keyspace.java:368)
[writer-0] 	at org.apache.cassandra.db.Keyspace.lambda$open$0(Keyspace.java:168)
[writer-0] 	at org.apache.cassandra.utils.concurrent.LoadingMap.lambda$blockingLoadIfAbsent$1(LoadingMap.java:273)
[writer-0] 	at org.apache.cassandra.utils.concurrent.LoadingMap.lambda$computeIfAbsent$0(LoadingMap.java:98)
[writer-0] 	at org.apache.cassandra.utils.concurrent.LoadingMap.updateOrRemoveEntry(LoadingMap.java:178)
[writer-0] 	at org.apache.cassandra.utils.concurrent.LoadingMap.computeIfAbsent(LoadingMap.java:98)
[writer-0] 	at org.apache.cassandra.utils.concurrent.LoadingMap.blockingLoadIfAbsent(LoadingMap.java:272)
[writer-0] 	at org.apache.cassandra.schema.Schema.maybeAddKeyspaceInstance(Schema.java:246)
[writer-0] 	at org.apache.cassandra.db.Keyspace.open(Keyspace.java:166)
[writer-0] 	at org.apache.cassandra.db.Keyspace.open(Keyspace.java:155)
[writer-0] 	at org.apache.cassandra.db.ColumnFamilyStore.getIfExists(ColumnFamilyStore.java:3406)
[writer-0] 	at com.datastax.cndb.metadata.storage.RemoteStorageHandler.maybeMarkIndexes(RemoteStorageHandler.java:1594)
[writer-0] 	at com.datastax.cndb.metadata.storage.RemoteStorageHandler.doReloadSSTables(RemoteStorageHandler.java:1229)
[writer-0] 	at com.datastax.cndb.metadata.storage.RemoteStorageHandler$Reloader.reload(RemoteStorageHandler.java:315)
[writer-0] 	at com.datastax.cndb.metadata.storage.RemoteStorageHandler$Reloader.lambda$maybeSchedule$5(RemoteStorageHandler.java:406)
[writer-0] 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
[writer-0] 	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
[writer-0] 	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
[writer-0] 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
[writer-0] 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
[writer-0] 	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
[writer-0] 	at java.base/java.lang.Thread.run(Thread.java:1570)

Here what happens is that the test keyspace is dropped by a @AfterEach, but this races with the periodic reloading of sstable which uses this getIfExists method.

This patch fixes ColumnFamilyStore.getIfExists to return null if the keyspace does not exists.

I will note that I reviewed all the usages of
ColumnFamilyStore.getIfExists in CNDB. They are all of the form:

ColumnFamilyStore cfs = ColumnFamilyStore.getIfExists(...);
if (cfs == null)
    return;

Meaning that this is used to not do some task if the table doesn't exists anymore. And so I'm confident this is the right change for CNDB.

For the usages within C*, there is a number of uses cases where the code does:

ColumnFamilyStore cfs = ColumnFamilyStore.getIfExists(...);
if (cfs == null)
    throw SomeException(...);

For such cases, this patch has the consequence that where previously a UnknownKeyspaceException was throw, now a SomeException will be throw instance. I assume this is fine. But in theory, some existing code could have been written to deal with UnknownKeyspaceException differently from SomeException somehow, and it would be really hard to follow every possible codepath upwards. Even if this were true, I can't imagine this have much consequences, so I'm reasonably confident with ignoring this risk, but wanted to mention it for completness.

Copy link

github-actions bot commented May 16, 2025

Checklist before you submit for review

  • Make sure there is a PR in the CNDB project updating the Converged Cassandra version
  • Use NoSpamLogger for log lines that may appear frequently in the logs
  • Verify test results on Butler
  • Test coverage for new/modified code is > 80%
  • Proper code formatting
  • Proper title for each commit staring with the project-issue number, like CNDB-1234
  • Each commit has a meaningful description
  • Each commit is not very long and contains related changes
  • Renames, moves and reformatting are in distinct commits
  • All new files should contain the DataStax copyright header instead of the Apache License one

Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
60.0% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

@pcmanus pcmanus force-pushed the getIfExists-and-dropped-keyspaces branch from 68cb0d8 to 4912cf7 Compare May 19, 2025 09:29
The `ColumnFamilyStore.getIfExists` method whole point is explained in
its javadoc, which says:
```
Differently from others, this method does not throw exception if the table does not exist
```
That method work as expected if the table does not exists _but_ the
keyspace does. But if the keyspace does not exists (in which case the
table does not exists by extension), then it unexpectedly throws
`UnknownKeyspaceException` instead of returning `null`.

That is clearly unexpected as the code has the following snippet:
```java
Keyspace keyspace = Keyspace.open(metadata.keyspace);
if (keyspace == null)
    return null;
```
but this does not work as `Keyspace#open` never return `null`, it throws
instead.

This problem is years old, so it is obviously not a big issue. But at
least in the context of CNDB, it can lead to unexpected errors in the
log, and that is the source of some flaky tests. For instance, a recent
run had a test failing with the following in the log:
```
[writer-0] DEBUG [grpc-default-executor-1] 2025-05-14 08:34:48,641 Schema.java:762 - Instance removed for keyspace testreadfromstorageservicewithpreparedidcollision
[writer-0] DEBUG [grpc-default-executor-1] 2025-05-14 08:34:48,641 Schema.java:765 - Awaiting on write barrier before dropping keyspace testreadfromstorageservicewithpreparedidcollision
[writer-0] DEBUG [StoragePeriodicReload:1] 2025-05-14 08:34:48,641 Keyspace.java:167 - New instance created for keyspace testreadfromstorageservicewithpreparedidcollision
[writer-0] ERROR [StoragePeriodicReload:1] 2025-05-14 08:34:48,642 RemoteStorageHandler.java:339 - Failed periodic reload of sstables for testreadfromstorageservicewithpreparedidcollision.default_table with reason SSTABLES_CHANGED and attempts 0, rescheduling:
[writer-0] org.apache.cassandra.exceptions.UnknownKeyspaceException: Could not find a keyspace testreadfromstorageservicewithpreparedidcollision
[writer-0] 	at org.apache.cassandra.db.Keyspace.<init>(Keyspace.java:368)
[writer-0] 	at org.apache.cassandra.db.Keyspace.lambda$open$0(Keyspace.java:168)
[writer-0] 	at org.apache.cassandra.utils.concurrent.LoadingMap.lambda$blockingLoadIfAbsent$1(LoadingMap.java:273)
[writer-0] 	at org.apache.cassandra.utils.concurrent.LoadingMap.lambda$computeIfAbsent$0(LoadingMap.java:98)
[writer-0] 	at org.apache.cassandra.utils.concurrent.LoadingMap.updateOrRemoveEntry(LoadingMap.java:178)
[writer-0] 	at org.apache.cassandra.utils.concurrent.LoadingMap.computeIfAbsent(LoadingMap.java:98)
[writer-0] 	at org.apache.cassandra.utils.concurrent.LoadingMap.blockingLoadIfAbsent(LoadingMap.java:272)
[writer-0] 	at org.apache.cassandra.schema.Schema.maybeAddKeyspaceInstance(Schema.java:246)
[writer-0] 	at org.apache.cassandra.db.Keyspace.open(Keyspace.java:166)
[writer-0] 	at org.apache.cassandra.db.Keyspace.open(Keyspace.java:155)
[writer-0] 	at org.apache.cassandra.db.ColumnFamilyStore.getIfExists(ColumnFamilyStore.java:3406)
[writer-0] 	at com.datastax.cndb.metadata.storage.RemoteStorageHandler.maybeMarkIndexes(RemoteStorageHandler.java:1594)
[writer-0] 	at com.datastax.cndb.metadata.storage.RemoteStorageHandler.doReloadSSTables(RemoteStorageHandler.java:1229)
[writer-0] 	at com.datastax.cndb.metadata.storage.RemoteStorageHandler$Reloader.reload(RemoteStorageHandler.java:315)
[writer-0] 	at com.datastax.cndb.metadata.storage.RemoteStorageHandler$Reloader.lambda$maybeSchedule$5(RemoteStorageHandler.java:406)
[writer-0] 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
[writer-0] 	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
[writer-0] 	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
[writer-0] 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
[writer-0] 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
[writer-0] 	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
[writer-0] 	at java.base/java.lang.Thread.run(Thread.java:1570)
```

Here what happens is that the test keyspace is dropped by a
`@AfterEach`, but this races with the periodic reloading of sstable
which uses this `getIfExists` method.

This patch fixes `ColumnFamilyStore.getIfExists` to return `null` if the
keyspace does not exists.

I will note that I reviewed all the usages of
`ColumnFamilyStore.getIfExists` in CNDB. They are all of the form:
```
ColumnFamilyStore cfs = ColumnFamilyStore.getIfExists(...);
if (cfs == null)
    return;
```
Meaning that this is used to not do some task if the table doesn't
exists anymore. And so I'm confident this is the right change for CNDB.

For the usages within C*, there is a number of uses cases where the code
does:
```
ColumnFamilyStore cfs = ColumnFamilyStore.getIfExists(...);
if (cfs == null)
    throw SomeException(...);
```
For such cases, this patch has the consequence that where previously a
`UnknownKeyspaceException` was throw, now a `SomeException` will be
throw instance. I assume this is fine. But in theory, some existing code
could have been written to deal with `UnknownKeyspaceException`
differently from `SomeException` somehow, and it would be really hard
to follow every possible codepath upwards. Even if this were true,
I can't imagine this have much consequences, so I'm reasonably confident
with ignoring this risk, but wanted to mention it for completness.
@pcmanus pcmanus force-pushed the getIfExists-and-dropped-keyspaces branch from 4912cf7 to 592f631 Compare June 20, 2025 14:14
Copy link

@cassci-bot
Copy link

❌ Build ds-cassandra-pr-gate/PR-1738 rejected by Butler


1 new test failure(s) in 3 builds
See build details here


Found 1 new test failures

Test Explanation Branch history Upstream history
...TimeoutTest.prepareRPCTimeout[SEQUENTIAL/false] regression 🔴🔵🔵 🔵🔵🔵🔵🔵🔵🔵

Found 4 known test failures

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants