[SPARK-55494][SS] Introduce iterator/prefixScan with multi-values in StateStore API#54278
[SPARK-55494][SS] Introduce iterator/prefixScan with multi-values in StateStore API#54278HeartSaVioR wants to merge 3 commits intoapache:masterfrom
Conversation
|
The last commit is only relevant to this PR. Once #53911 is merged, I will rebase this PR to remove commits except this one. |
4e3a915 to
0c5b92d
Compare
…e included in other PR" This reverts commit 314acd4.
0c5b92d to
a19f13b
Compare
|
cc. @anishshri-db Please take a look, thanks! |
| * | ||
| * It is expected to throw exception if Spark calls this method without proper key encoding spec. | ||
| * It is also expected to throw exception if Spark calls this method without setting | ||
| * multipleValuesPerKey as true for the column family. |
There was a problem hiding this comment.
Where do we verify the multipleValuesPerKey condition ?
There was a problem hiding this comment.
Each implementation should do it - RocksDBStateStoreProvider.prefixScanWithMultiValues checks it.
verify(
kvEncoder._2.supportsMultipleValuesPerKey,
"Multi-value iterator operation requires an encoder" +
" which supports multiple values for a single key")
| } | ||
| } | ||
|
|
||
| test(s"Event time as prefix: iterator with multiple values (encoding = $encoding)") { |
There was a problem hiding this comment.
Does Avro work here or will be added later ? if so, can we add a TODO ?
There was a problem hiding this comment.
The TODO comment for Avro is added in L226 - it's a broader foreach covering multiple test cases.
| // Put multiple values at different event times | ||
| // Insert in non-sorted order to verify ordering by event time | ||
| val values2 = Array(valueToRow(200), valueToRow(201)) | ||
| store.putList(keyAndTimestampToRow("key1", 1, 1000L), values2) |
There was a problem hiding this comment.
Lets add some different values here also ? similar to the encoder tests ?
There was a problem hiding this comment.
The test we added in encoder is extended to cover multi-values API as well. Would it be sufficient?
| // For prefix encoder, we use iterator | ||
| case "prefix" => | ||
| store.iterator() | ||
| if (useMultipleValuesPerKey) store.iteratorWithMultiValues() |
There was a problem hiding this comment.
Could we use braces ?
| // For postfix encoder, we use prefix scan with ("key1", 1) as the prefix key | ||
| case "postfix" => | ||
| store.prefixScan(keyToRow("key1", 1)) | ||
| if (useMultipleValuesPerKey) store.prefixScanWithMultiValues(keyToRow("key1", 1)) |
|
https://github.com/HeartSaVioR/spark/runs/63873409895 CI failure only happened in docker integration test which is unrelated. |
|
Thanks! Merging to master. |
What changes were proposed in this pull request?
This PR proposes to introduce iterator/prefixScan with multi-values in StateStore API.
Why are the changes needed?
The functionality is missing on StateStore API so when the caller sets multi-values for specific CF, that CF doesn't support scanning through the data. The new functionality will be used in new state format version in stream-stream join, specifically SPARK-55144 (#53930).
Does this PR introduce any user-facing change?
No.
How was this patch tested?
New UTs.
Was this patch authored or co-authored using generative AI tooling?
Generated-by: claude-4.5-sonnet
The above is used for creating a new test suite. All other parts aren't generated by LLM.