Skip to content

CNDB-13483: Fix loading PQ file when disabled_reads is true #1713

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

michaeljmarshall
Copy link
Member

What is the issue

Fixes: https://github.com/riptano/cndb/issues/13483

Initial Commits

  • CNDB-13483: Fix loading PQ file when disabled_reads is true
  • Refactor code to pull PQ fetching logic into single class
  • Add comment asking about reducing memory utilization when loading PQ
  • Fix the broken allRowsHaveVectorsInWrittenSegments method

What does this PR fix and why was it fixed

The cassandra.sai.disabled_reads prevents us from reading previous PQ objects during compaction, which leads to less efficient graph construction. This fixes that by adding a new searchable index that is capable of discovering/downloading/cleaning up the right index files to get that information.

I also have a tangential bug fix for code that was also impacted by the disabled_reads setting. However, there is a follow up question to ask if it is a valid implementation.

This fixes the method so that it actually
reads the postings list structure.
However, there is a larger question of
whether the implementation was actually
correct. It all comes down to what is in
the indexContext's view on a compactor.
We really only want to get the info
from the source components that are being
compacted. Further investigation needed
to determine if this is the right behavior.
@michaeljmarshall michaeljmarshall self-assigned this Apr 25, 2025
Copy link

Checklist before you submit for review

  • Make sure there is a PR in the CNDB project updating the Converged Cassandra version
  • Use NoSpamLogger for log lines that may appear frequently in the logs
  • Verify test results on Butler
  • Test coverage for new/modified code is > 80%
  • Proper code formatting
  • Proper title for each commit staring with the project-issue number, like CNDB-1234
  • Each commit has a meaningful description
  • Each commit is not very long and contains related changes
  • Renames, moves and reformatting are in distinct commits
  • All new files should contain the DataStax copyright header instead of the Apache License one

@michaeljmarshall
Copy link
Member Author

@jkni @eolivelli - I will add tests for this PR on Monday. The initial implementation is ready for review. Manual testing suggests that it works. I wanted to push it up to run the rest of CI to get that feedback by Monday

Comment on lines +403 to +406
// TODO should we load all of these for this op? It seems very unlikely that we want to consider the whole
// table when checking if all rows have vectors. A single empty vector will be enough to make this false,
// but that should really only impact that table.
var view = indexContext.getReferencedView(TimeUnit.SECONDS.toNanos(5));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am moving this TODO out of scope for this PR. Here is a follow up ticket: https://github.com/riptano/cndb/issues/14028

Copy link

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

please put your tick on each line of the checklist: #1713 (comment)

{
// May result in downloading file, but this metadata is valuable. We use a stream to avoid loading all the
// structures at once.
return metadatas.stream()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate the idea of using a Stream in order to not eagerly load stuff.
I think that Streams internally do some batching, if you want to not fall into that hidden behavior you could use a simple Iterator

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a reference where I can learn more? The javadoc for the Stream interface indicates:

 * Streams are lazy; computation on the source data is only performed when the
 * terminal operation is initiated, and source elements are consumed only
 * as needed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not (I have only spent time in debugging problems in the past, but I don't have references)

we can keep this in the current form


// first sstable has one-to-one
for (int i = 0; i < MIN_PQ_ROWS; i++)
execute("INSERT INTO %s (pk, v) VALUES (?, ?)", i, randomVectorBoxed(2));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side comment (not asking for a change):
make randomVectorBoxed with only 2 dimensions could return duplicate vectors and make the test flaky

Copy link

@jkni jkni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach looks good to me in general. I left some nits/questions inline. Is there a CNDB PR/have we exercised this code in CNDB yet?

Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
77.2% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

@cassci-bot
Copy link

❌ Build ds-cassandra-pr-gate/PR-1713 rejected by Butler


2 new test failure(s) in 7 builds
See build details here


Found 2 new test failures

Test Explanation Branch history Upstream history
...ctorCompactionTest.testDelayedIndexCreation[ca] regression 🔴🔵🔵🔵
o.a.c.u.b.BinLogTest.testTruncationReleasesLogS... regression 🔴🔵🔵🔵🔴🔵🔵 🔵🔵🔵🔵🔵🔵🔵

Found 7 known test failures

@cassci-bot
Copy link

❌ Build ds-cassandra-pr-gate/PR-1713 rejected by Butler


1 new test failure(s) in 6 builds
See build details here


Found 1 new test failures

Test Explanation Branch history Upstream history
...ctorCompactionTest.testDelayedIndexCreation[dc] regression 🔴🔴🔵

Found 7 known test failures

@VisibleForTesting
public boolean areSegmentsLoaded()
{
return searchableIndex instanceof V1SearchableIndex;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instanceof smells here, cannot we push this method into SearchableIndex ?

{
// May result in downloading file, but this metadata is valuable. We use a stream to avoid loading all the
// structures at once.
return metadatas.stream()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not (I have only spent time in debugging problems in the past, but I don't have references)

we can keep this in the current form

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants