-
Notifications
You must be signed in to change notification settings - Fork 21
CNDB-14077: Reduce compaction thread pool size to match num of physical cores #1736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Checklist before you submit for review
|
### What is the issue Fixes: riptano/cndb#14160 ### What does this PR fix and why was it fixed The loop is supposed to loop until the deadline, not after the deadline. The test fails without the change. (cherry picked from commit cd11ec8)
/** for parallelism within a single compaction | ||
* see comments to JVector PhysicalCoreExecutor -- HT tends to cause contention for the SIMD units | ||
*/ | ||
public static final ExecutorService compactionExecutor = new DebuggableThreadPoolExecutor(Runtime.getRuntime().availableProcessors() / 2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about making this configurable ? in case we need to rollback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwiw, I copied this logic from:
cassandra/src/java/org/apache/cassandra/index/sai/disk/vector/CompactionGraph.java
Lines 104 to 108 in 61f57a6
// see comments to JVector PhysicalCoreExecutor -- HT tends to cause contention for the SIMD units | |
private static final ForkJoinPool compactionSimdPool = new ForkJoinPool(Runtime.getRuntime().availableProcessors() / 2, | |
new LowPriorityThreadFactory(), | |
null, | |
false); |
Are we looking to make all of these configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
patch LGTM
but I think that the branch contains an additional commit from other patches
Accidentally cherry picked to this branch. This reverts commit a15ceca.
Companion CNDB test PR: https://github.com/riptano/cndb/pull/14297 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
❌ Build ds-cassandra-pr-gate/PR-1736 rejected by Butler1 new test failure(s) in 4 builds Found 1 new test failures
Found 6 known test failures |
Would like to see some perf data for ~10M scale datasets, with and without SIMD. When an architecture like AVX-512 is enabled, it makes sense to thread based on the number of physical (not virtual / hyperthreaded) CPUs. But what about when SIMD is not enabled or available? Then use hyperthreading? Perhaps the threadpool size should depend on SIMD enabling. |
What is the issue
Relates to https://github.com/riptano/cndb/issues/14077, but doesn't necessarily solve it.
What does this PR fix and why was it fixed
When we are parallelizing vector graph insertions, we want to set the number of threads to the physical cores, not the virtual ones. This should improve efficiency and in my limited testing reduces the number of concurrent updates to the ConcurrentNeighborMap during a build of the sift 1M dataset.
My data:
For normal graph construction before this change, I saw
19412
retries in theinsertDiverse
method.For normal graph construction with this change, I saw
9371
retries in theinsertDiverse
method.For normal graph + hierarchy before this change, I saw
22500
retries in theinsertDiverse
method.For graph
{'similarity_function' : 'euclidean', 'enable_hierarchy': 'true', 'construction_beam_width': '200', 'maximum_node_connections': '32'}
, without this change I saw58773
retries and with the change I saw27775
.The graph construction times fluctuated with the change, but I'm not sure time matters significantly since my mac does not have simd: