Skip to content

Use binary search in ShardBoundaries#getShardForToken #1792

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

michaeljmarshall
Copy link
Member

What is the issue

There is not an issue yet. I am submitting this PR to run tests and to start a discussion about the implementation.

What does this PR fix and why was it fixed

We currently scan shards in the getShardForToken method. Given that we have a sorted array and that this method is called multiple times for writes and reads, it seems worth considering a switch from an O(n) operation to an O(n log(n)) operation. This would like need further performance testing. I am just submitting to start a discussion.

@michaeljmarshall michaeljmarshall requested a review from blambov June 4, 2025 20:23
@michaeljmarshall michaeljmarshall self-assigned this Jun 4, 2025
Copy link

github-actions bot commented Jun 4, 2025

Checklist before you submit for review

  • Make sure there is a PR in the CNDB project updating the Converged Cassandra version
  • Use NoSpamLogger for log lines that may appear frequently in the logs
  • Verify test results on Butler
  • Test coverage for new/modified code is > 80%
  • Proper code formatting
  • Proper title for each commit staring with the project-issue number, like CNDB-1234
  • Each commit has a meaningful description
  • Each commit is not very long and contains related changes
  • Renames, moves and reformatting are in distinct commits
  • All new files should contain the DataStax copyright header instead of the Apache License one

Copy link

sonarqubecloud bot commented Jun 4, 2025

@cassci-bot
Copy link

✔️ Build ds-cassandra-pr-gate/PR-1792 approved by Butler


Approved by Butler
See build details here

@pkolaczk
Copy link

pkolaczk commented Jun 5, 2025

Overall I like this change and binary search should be in general better.

However binary search might not be faster when n is very small or when tokens are not spaced very uniformly (I guess it's not the problem here). I agree we might need a microbenchmark here. But if this method doesn't appear heavily in the profile, I'd say let's not spend too much time on that.

@michaeljmarshall
Copy link
Member Author

@blambov suggested we could branch based on number of shards, but that would only really make a difference if we performance tested it to find a meaningful inflection point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants