Resolves #1076: Incorrect continuation when scanning by EndpointType.PREFIX_STRING #3523

hoangmaihuy · 2025-08-02T09:57:32Z

KeyValueCursor is including null terminator \x00 when both endpoints are PREFIX_STRING, which causes continuation missing one byte from last key.

In prefixString unit test I populated 5 records ("apple", 0) -> ("apple", 4) and scanned with "a" prefix, the continuation after scanning the first two records is ple\x00... instead of pple\x00 and calculated continuationBytes in KeyValueCursorBase is ...a\x00ple\x00... instead of ...apple\x00....

ngbinh · 2025-08-05T04:45:48Z

@alecgrieser can you take a look? Thanks

…th EndpointType.PREFIX_STRING

alecgrieser

Thanks for taking a look at this! Definitely would be good to get this issue resolved

alecgrieser · 2025-08-08T10:39:09Z

...re/src/test/java/com/apple/foundationdb/record/provider/foundationdb/KeyValueCursorTest.java

+            fdb.database().run(tr -> {
+                for (int i = 0; i < 5; i++) {
+                    tr.set(subspace.pack(Tuple.from("apple", i)), Tuple.from("apple", i).pack());
+                }
+                return null;
+            });


This loop should probably be brought outside the run lambda. At present, this will run the block in a nested loop, which doesn't really mean anything semantically. Alternatively, we could run the body within the outer loop, which would set the data within the transaction, which would then be read by the scans. (FDB transactions can read their own writes.)

alecgrieser · 2025-08-08T11:20:47Z

...re/src/main/java/com/apple/foundationdb/record/provider/foundationdb/KeyValueCursorBase.java

@@ -246,6 +246,11 @@ protected void prepare() {
            // left (inclusive) and another on the right (exclusive).
            prefixLength = calculatePrefixLength();

+            // When both endpoints are prefix strings, we should strip off the last byte \x00 from Tuple
+            if (lowEndpoint == EndpointType.PREFIX_STRING && highEndpoint == EndpointType.PREFIX_STRING) {
+                prefixLength--;


Thanks! This looks right to me, in principle. We could probably use a bit more testing to make sure it handles some edge cases properly. I have a few suggestions on additional tests in the other file.

There's a second concern, and it has to do with cross-version compatibility. (This is, for what it's worth, what ended up being the problem with the previous attempt to fix this, #1106.) We expect the user to be able to take the continuation that is generated from one run and then hand it to another instance of the Record Layer and resume from where it has been left off. We expect this to work even if the software versions of the two Record Layer instances are different. (This is because we expect people to deploy fleets of different instances of the Record Layer that all access the same data, and then load balance requests across them. When they then perform a rolling upgrade of the fleet, then they might pick up a continuation generated from an older instance and hand it to a newer one, or vice versa. Once the upgrade is completed, this shouldn't be an issue.)

The concern here is that because we're implicitly changing the format here. In particular, if you had such a scan over a PREFIX_STRING range, then when the continuation is transferred between versions, then it will incorrectly interpret the prefix length.

The counter argument to caring about this (in this case) is that the prefix was already being wrongly handled, and so it was already an error to try and continue a prefix string search. So if the worst that can happen is that we now fail in a different way (or even the same way), maybe that's fine.

We do have tests that run multiple versions of the Record Layer and then validate the behavior across versions, including handling continuations. They use the YAML tests framework, which are set up to run a series of SQL queries against a schema template definition, and then it validates the expected results. It also runs each query in a mode where it stops after each row and resumes the query with a continuation, and in multi-server mode, it alternates between different server versions. I've added some tests of almost that behavior in this PR: #3536. The problem is that the SQL planner isn't currently smart enough to translate expressions like abc% into [PREFIX_STRING 'abc'] scan comparisons, so it doesn't actually exercise this codepath.

We have a few options here:

Convince ourselves this is fine without a test. I'm not thrilled with this option, but I could almost buy that because the continuations were already broken, we don't have any problem here. I suppose it's notable that the continuations themselves haven't been modified, it's just the way the continuations get interpreted. Maybe that's enough to convince us that this is fine?

Update SQL so that we can turn those scans into PREFIX_STRING ranges. It would be nice if we had that anyway, as it would make the queries more efficient. It would be a more involved change though as the SQL stuff has a sharper learning curve, and we'd also have to worry about how this interacted with the "plan cache". (In order to avoid costly re-planning, we have a cache we maintain of the plan associated with a given query. We also strip literals from the plan before caching it so that we can re-use plans for queries that are the same but with different constants. However, you wouldn't want to reuse a plan for SELECT * FROM T WHERE s LIKE 'abc%', which can use a PREFIX_STRING, with one for SELECT * FROM T WHERE s LIKE 'abc%d', which cannot without additional compensation. So whatever solution we come up with would need to take that into account.)

Do mixed mode testing in some other way. We currently don't have any other ways of doing mixed mode testing, but we could theoretically come up with something. That's probably a larger discussion though

alecgrieser · 2025-08-08T11:25:24Z

...re/src/test/java/com/apple/foundationdb/record/provider/foundationdb/KeyValueCursorTest.java

+                    Tuple.from("a"),
+                    Tuple.from("a"),
+                    EndpointType.PREFIX_STRING,
+                    EndpointType.PREFIX_STRING


It would probably be good to have some more tests. I can think of a couple like:

An empty string as the prefix string

A scan where there are substrings involved (e.g., both app and apple are in the list, and we search for a range that covers one or both of them)

A special case of the above where there are null characters involved. For example, app and app\0 and app\0le would be an interesting case because the way strings are encoded, we insert a final \0 byte at the end. The exact encoding for the first is \x02app\x00, the second is \x02app\x00\xff\x00, and the third is \x02app\x00\xffle\x00

Scans in reverse, validating that the scans still stitch up

You might need to restructure this to be a parameterized test that takes a prefix string and an expected set of results, and maybe page size and reverse-ness, and then it executes the scan across continuations and validates the results.

This adds an additional set of tests to `like.yamsql` that are designed to handle prefix scans where the prefix can be satisfied by an index. Our logic currently doesn't handle transforming `LIKE '${str}%'` into a tuple scan with a range like `[PREFIX_STRING ${str}]`, which means that we don't use the index as efficiently as we could. This is shown in the `explain`s here, where we only use the index in order to get a set of covering fields, but we don't add it to a scan range. Note that this is relevant to #3523 (addressing #1076), as if we did turn those kinds of `LIKE` expressions into prefix string scans, we'd get mixed-mode coverage of the fix for the bug being addressed there. AFAIK, there isn't any other SQL comparison that we have that we could use to generate prefix string scans in some other way.

hoangmaihuy changed the title ~~Resolves FoundationDB#1076: Incorrect continuation when scanning by EndpointType.PREFIX_STRING~~ Resolves #1076: Incorrect continuation when scanning by EndpointType.PREFIX_STRING Aug 2, 2025

Resolves FoundationDB#1076: [bug] Continuation unused in scanIndex wi…

36a3f4a

…th EndpointType.PREFIX_STRING

hoangmaihuy force-pushed the fix-prefix-string-continuation branch from 6abae78 to 36a3f4a Compare August 5, 2025 05:11

alecgrieser mentioned this pull request Aug 8, 2025

Add more prefix-based tests to like.yamsql #3536

Merged

alecgrieser requested changes Aug 8, 2025

View reviewed changes

alecgrieser added the bug fix Change that fixes a bug label Aug 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Resolves #1076: Incorrect continuation when scanning by EndpointType.PREFIX_STRING #3523

Resolves #1076: Incorrect continuation when scanning by EndpointType.PREFIX_STRING #3523

Uh oh!

hoangmaihuy commented Aug 2, 2025

Uh oh!

ngbinh commented Aug 5, 2025

Uh oh!

alecgrieser left a comment

Uh oh!

alecgrieser Aug 8, 2025

Uh oh!

alecgrieser Aug 8, 2025

Uh oh!

alecgrieser Aug 8, 2025

Uh oh!

Uh oh!

Resolves #1076: Incorrect continuation when scanning by EndpointType.PREFIX_STRING #3523

Are you sure you want to change the base?

Resolves #1076: Incorrect continuation when scanning by EndpointType.PREFIX_STRING #3523

Uh oh!

Conversation

hoangmaihuy commented Aug 2, 2025

Uh oh!

ngbinh commented Aug 5, 2025

Uh oh!

alecgrieser left a comment

Choose a reason for hiding this comment

Uh oh!

alecgrieser Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

alecgrieser Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

alecgrieser Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!