-
Notifications
You must be signed in to change notification settings - Fork 111
Resolves #1076: Incorrect continuation when scanning by EndpointType.PREFIX_STRING #3523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Resolves #1076: Incorrect continuation when scanning by EndpointType.PREFIX_STRING #3523
Conversation
@alecgrieser can you take a look? Thanks |
…th EndpointType.PREFIX_STRING
6abae78
to
36a3f4a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking a look at this! Definitely would be good to get this issue resolved
fdb.database().run(tr -> { | ||
for (int i = 0; i < 5; i++) { | ||
tr.set(subspace.pack(Tuple.from("apple", i)), Tuple.from("apple", i).pack()); | ||
} | ||
return null; | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This loop should probably be brought outside the run
lambda. At present, this will run the block in a nested loop, which doesn't really mean anything semantically. Alternatively, we could run the body within the outer loop, which would set the data within the transaction, which would then be read by the scans. (FDB transactions can read their own writes.)
@@ -246,6 +246,11 @@ protected void prepare() { | |||
// left (inclusive) and another on the right (exclusive). | |||
prefixLength = calculatePrefixLength(); | |||
|
|||
// When both endpoints are prefix strings, we should strip off the last byte \x00 from Tuple | |||
if (lowEndpoint == EndpointType.PREFIX_STRING && highEndpoint == EndpointType.PREFIX_STRING) { | |||
prefixLength--; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This looks right to me, in principle. We could probably use a bit more testing to make sure it handles some edge cases properly. I have a few suggestions on additional tests in the other file.
There's a second concern, and it has to do with cross-version compatibility. (This is, for what it's worth, what ended up being the problem with the previous attempt to fix this, #1106.) We expect the user to be able to take the continuation that is generated from one run and then hand it to another instance of the Record Layer and resume from where it has been left off. We expect this to work even if the software versions of the two Record Layer instances are different. (This is because we expect people to deploy fleets of different instances of the Record Layer that all access the same data, and then load balance requests across them. When they then perform a rolling upgrade of the fleet, then they might pick up a continuation generated from an older instance and hand it to a newer one, or vice versa. Once the upgrade is completed, this shouldn't be an issue.)
The concern here is that because we're implicitly changing the format here. In particular, if you had such a scan over a PREFIX_STRING
range, then when the continuation is transferred between versions, then it will incorrectly interpret the prefix length.
The counter argument to caring about this (in this case) is that the prefix was already being wrongly handled, and so it was already an error to try and continue a prefix string search. So if the worst that can happen is that we now fail in a different way (or even the same way), maybe that's fine.
We do have tests that run multiple versions of the Record Layer and then validate the behavior across versions, including handling continuations. They use the YAML tests framework, which are set up to run a series of SQL queries against a schema template definition, and then it validates the expected results. It also runs each query in a mode where it stops after each row and resumes the query with a continuation, and in multi-server mode, it alternates between different server versions. I've added some tests of almost that behavior in this PR: #3536. The problem is that the SQL planner isn't currently smart enough to translate expressions like abc%
into [PREFIX_STRING 'abc']
scan comparisons, so it doesn't actually exercise this codepath.
We have a few options here:
- Convince ourselves this is fine without a test. I'm not thrilled with this option, but I could almost buy that because the continuations were already broken, we don't have any problem here. I suppose it's notable that the continuations themselves haven't been modified, it's just the way the continuations get interpreted. Maybe that's enough to convince us that this is fine?
- Update SQL so that we can turn those scans into
PREFIX_STRING
ranges. It would be nice if we had that anyway, as it would make the queries more efficient. It would be a more involved change though as the SQL stuff has a sharper learning curve, and we'd also have to worry about how this interacted with the "plan cache". (In order to avoid costly re-planning, we have a cache we maintain of the plan associated with a given query. We also strip literals from the plan before caching it so that we can re-use plans for queries that are the same but with different constants. However, you wouldn't want to reuse a plan forSELECT * FROM T WHERE s LIKE 'abc%'
, which can use aPREFIX_STRING
, with one forSELECT * FROM T WHERE s LIKE 'abc%d'
, which cannot without additional compensation. So whatever solution we come up with would need to take that into account.) - Do mixed mode testing in some other way. We currently don't have any other ways of doing mixed mode testing, but we could theoretically come up with something. That's probably a larger discussion though
Tuple.from("a"), | ||
Tuple.from("a"), | ||
EndpointType.PREFIX_STRING, | ||
EndpointType.PREFIX_STRING |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would probably be good to have some more tests. I can think of a couple like:
- An empty string as the prefix string
- A scan where there are substrings involved (e.g., both
app
andapple
are in the list, and we search for a range that covers one or both of them) - A special case of the above where there are
null
characters involved. For example,app
andapp\0
andapp\0le
would be an interesting case because the way strings are encoded, we insert a final\0
byte at the end. The exact encoding for the first is\x02app\x00
, the second is\x02app\x00\xff\x00
, and the third is\x02app\x00\xffle\x00
- Scans in reverse, validating that the scans still stitch up
You might need to restructure this to be a parameterized test that takes a prefix string and an expected set of results, and maybe page size and reverse-ness, and then it executes the scan across continuations and validates the results.
This adds an additional set of tests to `like.yamsql` that are designed to handle prefix scans where the prefix can be satisfied by an index. Our logic currently doesn't handle transforming `LIKE '${str}%'` into a tuple scan with a range like `[PREFIX_STRING ${str}]`, which means that we don't use the index as efficiently as we could. This is shown in the `explain`s here, where we only use the index in order to get a set of covering fields, but we don't add it to a scan range. Note that this is relevant to #3523 (addressing #1076), as if we did turn those kinds of `LIKE` expressions into prefix string scans, we'd get mixed-mode coverage of the fix for the bug being addressed there. AFAIK, there isn't any other SQL comparison that we have that we could use to generate prefix string scans in some other way.
KeyValueCursor
is including null terminator\x00
when both endpoints are PREFIX_STRING, which causes continuation missing one byte from last key.In
prefixString
unit test I populated 5 records("apple", 0)
->("apple", 4)
and scanned with "a" prefix, the continuation after scanning the first two records isple\x00...
instead ofpple\x00
and calculatedcontinuationBytes
inKeyValueCursorBase
is...a\x00ple\x00...
instead of...apple\x00...
.