-
Notifications
You must be signed in to change notification settings - Fork 21
CNDB-14207: Don't break the whole flush if SAI fails to build #1770
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Checklist before you submit for review
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this may be "fine" because Compaction will pick up the sstable and "fix" the problem, at least in CNDB (I don't know how this will play in Cassandra).
When this happens the sstable is in a bad state and the index will become suddendly unavailable and basically all the queries against that index will fail (probably most of the queries for that table?)
Probably this decision deserves a larged discussion at the triage meeting
This is what already happens in CNDB. But I think I get what you're trying to say. |
src/java/org/apache/cassandra/index/sai/disk/StorageAttachedIndexWriter.java
Outdated
Show resolved
Hide resolved
test/unit/org/apache/cassandra/index/sai/functional/FlushingTest.java
Outdated
Show resolved
Hide resolved
Thinking about it more, I feel this inconsistency between flush and compaction behavior introduced by this PR is not good. I think there are two sensible ways of handling SAI build failures:
However, I fail to see how the current behavior is useful. WDYT? |
I kind of think this is how flush is intended to be. Also if a flush fails, most likely we probably want to be quarantining a pod and moving tenants away from it, because the reasons flushes fail are mostly not very recoverable. |
This commit changes handling SAI index flush failures. A flush does not force the index into the non-queryable state anymore. We can do this because after failure we rollback any partially flushed index components, and we abort the sstable flush as well. Therefore, both the failed-to-flush memtable and memtable indexes remain intact and can still serve queries. This change has several advantages: - the flush failure could be temporary and the flush may still succeed the next time, - even if the problem with flushing persists, reads will run fine - if this is a node-local problem, the other nodes have a chance to take over; a failure of one node does not propagate to the rest of the cluster
@@ -303,7 +303,7 @@ public void abort(Throwable accumulator, boolean fromIndex) | |||
|
|||
// For non-compaction, make any indexes involved in this transaction non-queryable, as they will likely not match the backing table. | |||
// For compaction: the compaction task should be aborted and new sstables will not be added to tracker | |||
if (fromIndex && opType != OperationType.COMPACTION) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this change means if we abort during a flush we should just abort the index creation and not fail the index? Please update the comment above as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When abort happens during flush, does it also mean the sstable was aborted? Just want to understand that we are not creating a case where we have data which can be queried from a direct query, but would be missing from an index.
|
❌ Build ds-cassandra-pr-gate/PR-1770 rejected by Butler1 new test failure(s) in 3 builds Found 1 new test failures
Found 4 known test failures |
If the index fails to build, mark the index non-queryable,
but don't fail the C* flush. This way the node can continue
to run the other queries.