-
Notifications
You must be signed in to change notification settings - Fork 21
CNDB-13745: Fix checksum for IncrementalChecksumSequentialWriter edge case #1726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main-5.0
Are you sure you want to change the base?
Conversation
- use `cassandra.remote_storage_handler_factory` to replace `cassandra.remote_storage_handler` - add `TableMetadata` to `StorageProvider#createDataDirectories` signature - use `enableAutoCompaction` to replace `StorageHandler#isRemote` - emit SSTaleAddingNotification in LifecycleTransaction#checkpoint before new sstables are made visible. Note that early open is not supported because it opens sstables before `trackNewWritten`
Make `RowFilter` non-iterable because it’s confusing in a tree of expressions. Also: * Change the meaning of `RowFilter#expressions()` to traverse the entire tree of expressions. * Add a `#withoutDisjunctions()` utility method for legacy indexes. * Inspect and update if needed all the uses of RowFilter iteration to ensure that they are valid. * Remove the `warnIfFilterIsATree` warning since it should be clear what each method does.
Closes riptano/cndb#11932 The anti-join strategy for the inequality operator, `!=`, can be replaced with a union strategy of two semi-ranges, which make the plan explicit and can be more efficient. This commit plans the inequality operator with the union of two semi-ranges, thus no anti-join iterator is used for it. In the case of truncatable types as Big Decimal and Big Integer it implements full index scan as the range index scan plan node without bounds. This PR also adds tests for `not contains key` to discard the usage the union strategy by testing the false negative case of an empty map.
This performance regression was introduced by 9cabec5. I haven't been able to find the exact commit responsible for this regression, but essentially, we switched from `DirectReaders` to the lucene `DirectReader` around 0774345 and 6bd00d1, and that led to a lot of unnecessary object creation. In my initial testing, this change appeared to improve numeric query throughput from `282.03it/s` for 100k queries to `333.15it/s` for 100k queries. The memory profile also showed far fewer objects created. Note: the above numbers might have been from variability in my testing. It would be helpful to test in a more controlled environment. Either way, based on my understanding of the objects involved, this should generally produce better results because we'll have fewer objects. Finally, this change is especially helpful because the `sort then filter` logic has to initialize the the iterators for search, so it helps in both execution paths.
This splits compactions that are to produce more than one output sstable into tasks that can execute in parallel. Such tasks share a transaction and have combined progress and observer. Because we cannot mark parts of an sstable as unneeded, the transaction is only applied when all tasks have succeeded. This also means that early open is not supported for such tasks. The parallelization also takes into account thread reservations, reducing the parallelism to the number of available threads for its level. The new functionality is turned on by default. Major compactions will apply the same mechanism to parallelize the operation. They will only split on pre- existing boundary points if they are also boundary points for the current UCS configuration. This is done to ensure that major compactions can re-shard data when the configuration is changed. If pre-existing boundaries match the current state, a major compaction will still be broken into multiple operations to reduce the space overhead of the operation. Also: - Introduces a parallelism parameter to major compactions (`nodetool compact -j <threads>`, defaulting to half the compaction threads) to avoid stopping all other compaction for the duration. - Changes SSTable expiration to be done in a separate `getNextBackgroundCompactions` round to improve the efficiency of expiration (separate task can run quickly and remove the relevant sstables without waiting for a compaction to end). - Applies small-partition-count correction in `ShardManager.calculateCombinedDensity`.
…CS (#1464) - add configuration to skip mutating STATS after receiving ZCS sstable
Expose some methods needed for riptano/cndb#12128
Unbounded queue length at the native transport stage can caused large backlogs of requests that the system processes, even though clients may no longer expect a response. This PR implements a limited backport of CNDB-11070, introducing the notion of a native request timeout that can shed messages with excessive queue times at the NTR stage as well as async read/write stages, if enabled. Cross-node message timeouts are also now respected earlier in the mutation verb handler. This is a fairly straightforward cherry-pick of #1393 targeting main instead of cc-main-migration-release.
This reverts commit c06c94c. It seems the removal of `Index#postProcessor` by CNDB-11762 broke some tests in CNDB's `MultiNodeBillingTest`. Unfortunately that patch [was merged before creating the PR bumping the CC version used by CNDB](#1422 (comment)). [The CNDB PR](riptano/cndb#12076) was created after that merging but it was superseded by other CC version bumps. So I'm adding this reversal so we can investigate how the removal of `Index#postProcessor` affects those tests.
This patch replaces null values of `deterministic`, `monotonic` and `monotonic_on` columns in `system_schema.functions` and `system_schema.aggregates` with negative defaults. These defaults will be addressed if/once DB-672 gets ported to CC.
There are two mechanisms of detecting that the cluster is in the upgrade state and the minimum version. Both are slightly different, and both are not pluggable which means that CNDB doesn't work properly with them. Those mechanisms are implemented in `Gossiper`. Although we do not use `Gossiper` in CNDB, there are classes like `ColumnFilter` which go to `Gossiper` to check the upgrade state. So far, using that stuff in CDNB was a bit unpredictable, some of them reported the cluster is upgraded and in the current version, the other did not. This turned out to be a problem, especially for the `ColumnFilter` because when we upgrade DSE --> CC, CC assumes that the newest filter version should be used, which is not correctly deserialized and interpreted by DSE. The fix is not small, but it probably simplifies stuff a bit. First of all, two mechanism are merged into one. Moreover, we added pluggability of it so that we can provide the appropriate implementation in CNDB coordinators and writers, which is based on ETCD.
Part of riptano/cndb#12139 Moves constant shard count outside looping shards to reduce confusion.
…with DurationSpec type and 'native_transport_timeout_in_ms' as convertible old name with Long type; add some tests.
…MemtableIndexTest and TrieMemtableIndexAllocationsHeapBuffersTest from main branch.
…strictions (#1449) Closes riptano/cndb#12139 This PR adds a test of row count of a SAI plan in the presence of restrictions. Currently it tests queries with inequality, equality and half-ranges on different SAI column types and with or without histograms.
…g VIntOutOfRangeException to the catch block of SSTableIdentityIterator.exhaust method.
…pactionProgress to return the operation type from the first entry in the shared progress; needed in cases that a CompactionTask type is changed after creation.
…opriate (#1469) Fixes riptano/cndb#12239 We found the System.nanoTime was using significant cpu cost, but because the timeout is high enough, we can accept the inaccuracy. - [ ] Make sure there is a PR in the CNDB project updating the Converged Cassandra version - [ ] Use `NoSpamLogger` for log lines that may appear frequently in the logs - [ ] Verify test results on Butler - [ ] Test coverage for new/modified code is > 80% - [ ] Proper code formatting - [ ] Proper title for each commit staring with the project-issue number, like CNDB-1234 - [ ] Each commit has a meaningful description - [ ] Each commit is not very long and contains related changes - [ ] Renames, moves and reformatting are in distinct commits
Fixes regression in jvector 3.0.4 when compacting PQVectors larger than 2GB
SimpleClientPerfTest has been failing in CI since changes from CNDB-10759 This change in `SimpleClientPerfTest`, updates the anonymous class `Message.Codec<QueryMessage>` to override the correct method, `public CompletableFuture<Response> maybeExecuteAsync` from `QueryMessage`, whose signature was changed as part of CNDB-10759. - [ ] Make sure there is a PR in the CNDB project updating the Converged Cassandra version - [ ] Use `NoSpamLogger` for log lines that may appear frequently in the logs - [ ] Verify test results on Butler - [ ] Test coverage for new/modified code is > 80% - [ ] Proper code formatting - [ ] Proper title for each commit staring with the project-issue number, like CNDB-1234 - [ ] Each commit has a meaningful description - [ ] Each commit is not very long and contains related changes - [ ] Renames, moves and reformatting are in distinct commits
…ing for async batchlog removal (#1485) The test asserts that the batchlog is removed immediately after the write completes, but removal of the batchlog is async and can be delayed, particularly in resource-limited environments like CI.
The test generates partition index accesses by reusing the same key, and if the key cache is enabled, the test will fail for bigtable profiles because the key will be in the key cache.
…by filtering queries (#1484) Queries creating fake index contexts each create their own context, which can then race on metric registration (as the metrics have the same patterns). This can cause a query to fail. These metrics are superfluous, we can skip creating them entirely.
… index format version 'dx', 'cx', or older.
…ate.accumulatedDataSize; it only worked to fix SensorsWriteTest.testMultipleRowsMutationWithClusteringKey for SkipListMemtable and may not be necessary if the default memtable becomes TrieMemtable. Revisit SensorsWriteTest later if necessary.
Move static class TrieMemtable.Factory to TrieMemtableFactory class; Use suggested TriePartitionUpdate.unsharedHeapSize implementation; Use InMemoryTrie.shortLived in TrieToDotTest and TrieToMermaidTest; Add specific versions aa, ca, and da to RowIndexTest;
Add addMemoryUsageTo in SkipListMemtable and TrieMemtable Add TrieMemtable.switchOut
### What is the issue The `main-5.0` branch uses `TrieMemtable` by default and the `test/conf/cassandra.yaml` configures `ByteOrderedPartitioner`. In this environment, `CQLSSTableWriterDaemonTest` tests `testSkipBuildingIndexesWithSAI` and `testWriteWithSAI` are failing. ### What does this PR fix and why was it fixed This fix moves the SAI specific tests from `CQLSSTableWriterTest` to a new `CQLSSTableWriterSAITest` so that `Murmur3Partitioner` can be explicitly set for the SAI specific tests.
…ngWIthAggregateTest to use paging in bytes for CC. (#1702) ### What is the issue With the rebase of CC on C* 5.0.2, there is a new test CoordinatorReadLatencyMetricTest and CoordinatorReadLatencyMetricTest.internalPagingWithAggregateTest is failing. ### What does this PR fix and why was it fixed Uses paging in bytes instead of paging in rows in internalPagingWithAggregateTest.
Removes unnecessary tiebreaker column, since the word counts give different scores for text with the same term count. One value is improved to remove possibility for flakiness in the order. Adds comments with the term and word counts to help verifying the expected results. A helper function to count them is added. Also adds a small tests for analyzer with lowercase.
#1691) Executing a query with BM25 search and a condition on partial SSTable results in empty iterator access error. And there was no test with storing data in segments. The PR implements BM25 search tests with splitting data into two tables. This reproduced this bug, CNDB-13696, and demonstrates current confusion on the BM25 ordering result to be fixed by CNDB-13553. This PR adds a check for empty iterator created for a PK belonging to another segment. This fixes the bug of trying to get the first element of an empty iterator.
…nTest (#1664) ### What is the issue `PendingAntiCompactionTest.testRetriesTimeout` is flaky on `main` failing with a timeout reported in the junit test task. riptano/cndb#13524 ### What does this PR fix and why was it fixed `PendingAntiCompactionTest.testRetriesTimeout`, and a couple other tests, are calling `Future.get` without any timeout. This change adds a 30 seconds timeout to prevent an indefinite wait for the future and a log in the failing test rather than in the junit test task.
### What is the issue This is a follow up test for riptano/cndb#13696. ### What does this PR fix and why was it fixed The fix was already merged, but this confirms that without the fix, collections are affected by the NoSuchElementException bug.
Relates to riptano/cndb#13766 Brings in several improvements. CNDB PR: riptano/cndb#13789
### What is the issue Didn't create an issue as this is just a new test. ### What does this PR fix and why was it fixed We implicitly rely on this and are exposing this as a feature to our end users, so we want to confirm that the stop words are the expected stop words.
Fixes: riptano/cndb#13718 We had adopted this version of lucene datastax/lucene@5ea8bb4 in order to support our custom modifications of HNSW on top of lucene. We now use https://github.com/datastax/jvector for vector search and no longer need a custom build. I propose we use 9.8.0 since that is the closest release to the one we have been using. CNDB test pr: riptano/cndb#13757
Checklist before you submit for review
|
Because it's involved in writing vints, If you are interested, take a look at CASSANDRA-15215 where this was introduced. |
Thank you for the reference @blambov. It looks like we don't even use the |
Interesting, ok I will take a look |
What is the issue
Fixes: https://github.com/riptano/cndb/issues/13745
What does this PR fix and why was it fixed
In the
IncrementalChecksumSequentialWriter
, we callwriteMostSignificantBytes
, and that has a minor "bug" in the implementation in theBufferedDataOutputStreamPlus
class. In the following logic:when we call
super.writeMostSignificantBytes
, we accidentally double count the next bytes in the checksum. This is in part due to the hybrid solution. Instead, I propose that we remove the naive solution (that is likely slower) and instead just do the bit math that calls the appropriatewriteBytes
,writeShorte
, etc. methods.