Support Shredded Lists/Array in `variant_get` by sdf-jkl · Pull Request #8354 · apache/arrow-rs

sdf-jkl · 2025-09-16T03:27:40Z

Which issue does this PR close?

closes [Variant] Support VariantPathElement::Index for Variant Arrays for variant_get #9443.

Rationale for this change

We should be able to variant_get using Indices to path through VariantArrays

What changes are included in this PR?

Are these changes tested?

Yes, unit tested.

Are there any user-facing changes?

scovich

Left a couple comments that are hopefully helpful.

Also, we should (eventually) support nesting -- arrays and structs inside arrays.
Let's get simple lists of primitives working first, tho!

parquet-variant-compute/src/variant_get.rs

…ed_list_support

scovich

I'm not sure I understand how these unit tests will translate to variant_get?

sdf-jkl · 2025-09-19T15:45:45Z

I'm not sure I understand how these unit tests will translate to variant_get?

Could you elaborate please?

I am currently trying to build just the Shredded List VariantArray test case, and while doing so learning how we could build them in shred_variant later. Once have a good way of building simple Shredded List VariantArray it will be easy to work on the rest of the unit tests for variant_get

scovich · 2025-09-19T21:18:02Z

I'm not sure I understand how these unit tests will translate to variant_get?

Could you elaborate please?

I am currently trying to build just the Shredded List VariantArray test case, and while doing so learning how we could build them in shred_variant later. Once have a good way of building simple Shredded List VariantArray it will be easy to work on the rest of the unit tests for variant_get

No worries -- the current iteration does look it produces a correct shredded variant containing a list, so I should probably just be patient and let you finish!

sdf-jkl · 2025-09-23T21:53:42Z

Hey @scovich I see that your current implementation of follow_shredded_path_element for VariantPathElement::Field when following the shredded path is successful, it returns a ShreddedPathStep::Success(field.shredding_state()) that holds a ShreddingState::Typed that holds a reference to the typed_value array. (That we later use for the next steps)

My question is: does ShreddedPathStep::Success() necessarily have to require the input ShreddingState to be a reference?

The reason I am asking is that since we use the output of follow_shredded_path_element to get the values from the shredded VariantArray, shouldn't we be free to drop the outer array once we extract the relevant typed_value?

The only way to work with list arrays I came up with so far, is to build new arrays with arrow_select::take, combining the path index and GenericListArray offsets.
But by using this method we create new arrays within the scope of the function and can't use a reference to the array in the ShreddedPathStep::Success.
(I just pushed a commit with a non-working implementation of the idea)

~~Should we instead look for another way to represent a resulting array consisting of slices instead?~~

I just saw the #8392

…ed_list_support

sdf-jkl · 2025-09-23T22:03:45Z

parquet-variant-compute/src/variant_get.rs

+            // Build the list of indices to take
+            let mut take_indices = Vec::with_capacity(list_len);
+            for i in 0..list_len {
+                let start = offsets[i] as usize;
+                let end = offsets[i + 1] as usize;
+                let len = end - start;
+
+                if *index < len {
+                    take_indices.push(Some((start + index) as u32));
+                } else {
+                    take_indices.push(None);
+                }
+            }
+
+            let index_array = UInt32Array::from(take_indices);
+
+            // Use Arrow compute kernel to gather elements
+            let taken = take(field_array, &index_array, None)?;


You can see the basic idea here

…ed_list_support

sdf-jkl · 2025-09-25T21:38:58Z

Hey @scovich I made it work for a one of the simple tests and it doesn't go through with the second one because Variant to Arrow does not support utf8 yet.

Do we have an issue tracking variant_to_arrow types support? If not, I can make one.

scovich · 2025-09-26T13:27:02Z

I made it work for a one of the simple tests and it doesn't go through with the second one because Variant to Arrow does not support utf8 yet.

Do we have an issue tracking variant_to_arrow types support? If not, I can make one.

I'm not sure we have a tracking issue for utf8 support in variant_to_arrow, but I've also noticed that it's an annoying gap for unit testing (we all seem to reach for string values...)

…ed_list_support

…row-rs into shredded_list_support

sdf-jkl · 2026-02-19T23:04:13Z

I want to continue working here on #9443.

@klion26 I'll start addressing your old comments first.

…tep::Success

sdf-jkl · 2026-02-23T19:06:04Z

@klion26 @scovich please review when available. thanks!

scovich · 2026-02-23T19:53:23Z

Looking for an early pass? Or is this Ready for review now?

sdf-jkl · 2026-02-23T19:56:59Z

@scovich an early pass to check the test coverage should do.

Once the test coverage is complete and the code passes, it will be Ready for review

scovich

Thanks for taking a stab at this. Several comments.

parquet-variant-compute/src/variant_get.rs

scovich · 2026-02-23T20:07:18Z

parquet-variant-compute/src/variant_get.rs

+            return Err(ArrowError::CastError(format!(
+                "Cannot access index '{}' for row {} with list length {}",
+                index, row, len
+            )));


We need to decide what the out of bounds semantics should be. For example, spark just returns NULL.

By way of comparison, spark and arrow-rs both return NULL for non-existent struct fields, which could be argued as analogous. Or maybe it's considered different and we want the error. Or maybe missing struct fields are also handled wrong?

(I'm comfortable with following spark semantics, but would love to hear others' thoughts)

I support following the spark semantics too

Do we need to add the behavior to the document or somewhere else?

Added to docs here 91589ad

The latest behavior is to error out on OOB access unless safe casting is enabled.
Spark semantics would just return NULL regardless of that flag.

Actually tho -- I think spark is just implementing jsonpath semantics:

A syntactically valid segment MUST NOT produce errors when executing the query. This means that some operations that might be considered erroneous, such as using an index lying outside the range of an array, simply result in fewer nodes being selected.

Here, "syntactically valid" is referring to the previous section (2.1):

A JSONPath implementation MUST raise an error for any query that is not well-formed and valid. The well-formedness and the validity of JSONPath queries are independent of the JSON value the query is applied to. No further errors relating to the well-formedness and the validity of a JSONPath query can be raised during application of the query to a value. This clearly separates well-formedness/validity errors in the query from mismatches that may actually stem from flaws in the data.

Note: Integer overflow in an index is well-formed but not valid, so it's allowed to produce an error.

scovich · 2026-02-23T20:09:24Z

parquet-variant-compute/src/variant_get.rs

+
+    // Gather both typed and fallback values at the requested element index.
+    let taken_value = value_array
+        .map(|value| take(value, &index_array, None))


aside: TIL about take. Very helpful here.

klion26 · 2026-02-25T14:01:45Z

parquet-variant-compute/src/variant_get.rs

+            return Err(ArrowError::CastError(format!(
+                "Cannot access index '{}' for row {} with list length {}",
+                index, row, len
+            )));


Do we need to add the behavior to the document or somewhere else?

parquet-variant-compute/src/variant_get.rs

…ed_list_support

parquet-variant-compute/src/variant_get.rs

scovich · 2026-03-02T20:47:23Z

Everything looks good, code-wise -- nice and clean.

But there's still an open question of whether we intend to follow the jsonpath spec in our path step logic, as e.g. spark does?
#8354 (comment)

The jsonpath spec requires foo[100] to return NULL if foo is not an array, and also requires returning NULL if foo has fewer than 101 elements. Similarly, foo.bar should return NULL if foo is not a struct and should also return NULL if foo has no field named bar. Safe casting would only influence actual casting decisions, e.g. a variant_get call that specifically requests a string and the requested path points to a struct.

In contrast, our current struct handling code currently returns an error if safe casting is disabled and:

a Field path step encounters a "wrong" type (L169)
an Index path step encounters a "wrong" type (L224)
an Index path step is out of bounds (L99)

scovich · 2026-03-02T20:48:28Z

@alamb -- any opinions about supporting jsonpath semantics or not? Or ideas on who we should seek input from?

sdf-jkl and others added 3 commits September 15, 2025 13:49

Add test shredded variant list array

9c25cc4

Add basic tests

ed961a4

Merge branch 'apache:main' into shredded_list_support

03ecb95

github-actions bot added the parquet-variant parquet-variant* crates label Sep 16, 2025

sdf-jkl mentioned this pull request Sep 16, 2025

[Variant] Support Shredded Lists/Array in variant_get #8082

Closed

scovich reviewed Sep 16, 2025

View reviewed changes

sdf-jkl and others added 2 commits September 16, 2025 16:21

Merge branch 'apache:main' into shredded_list_support

158d6d7

Redo test shredded array

d53c831

sdf-jkl commented Sep 17, 2025

View reviewed changes

parquet-variant-compute/src/variant_get.rs Outdated Show resolved Hide resolved

sdf-jkl commented Sep 17, 2025

View reviewed changes

parquet-variant-compute/src/variant_get.rs Outdated Show resolved Hide resolved

sdf-jkl commented Sep 17, 2025

View reviewed changes

parquet-variant-compute/src/variant_get.rs Outdated Show resolved Hide resolved

sdf-jkl commented Sep 17, 2025

View reviewed changes

parquet-variant-compute/src/variant_get.rs Outdated Show resolved Hide resolved

sdf-jkl commented Sep 17, 2025

View reviewed changes

parquet-variant-compute/src/variant_get.rs Outdated Show resolved Hide resolved

scovich mentioned this pull request Sep 18, 2025

[Variant] Define new shred_variant function #8366

Merged

Merge branch 'main' of https://github.com/apache/arrow-rs into shredd…

174e429

…ed_list_support

scovich reviewed Sep 19, 2025

View reviewed changes

Rebuild the shredded list array

69de7d7

sdf-jkl added 2 commits September 23, 2025 17:54

Use select::take to build the output array

cc6d787

Merge branch 'main' of https://github.com/apache/arrow-rs into shredd…

8f6ad1b

…ed_list_support

sdf-jkl commented Sep 23, 2025

View reviewed changes

sdf-jkl added 3 commits September 25, 2025 11:46

Merge branch 'main' of https://github.com/apache/arrow-rs into shredd…

bc8abd9

…ed_list_support

Pass one test

c0d2065

Merge branch 'main' of https://github.com/apache/arrow-rs into shredd…

85aaa3f

…ed_list_support

Get typed values directly

40b6311

Added support for utf8, largeUtf8, utf8view

f6e88ef

sdf-jkl deleted the shredded_list_support branch January 3, 2026 00:14

sdf-jkl restored the shredded_list_support branch February 19, 2026 22:16

sdf-jkl added 2 commits February 19, 2026 17:23

Merge branch 'main' of https://github.com/apache/arrow-rs into shredd…

0c32647

…ed_list_support

Merge branch 'shredded_list_support' of https://github.com/sdf-jkl/ar…

5b899d8

…row-rs into shredded_list_support

sdf-jkl reopened this Feb 19, 2026

sdf-jkl marked this pull request as draft February 19, 2026 23:04

sdf-jkl added 3 commits February 20, 2026 18:58

Simplify tests using shred_variant

9cd01d2

Add tests suggested by @klion26

cecd39f

Fix typed and untyped values logic

a776982

sdf-jkl mentioned this pull request Feb 21, 2026

Enable LargeList / ListView / LargeListView for VariantArray::try_new #9455

Closed

sdf-jkl added 2 commits February 21, 2026 13:40

Add support for LargeListArray + OBB err when safe_cast

cfe7c00

Use ShreddingState instead of BorrowedShreddingState in ShreddedPathS…

fc99bf0

…tep::Success

scovich reviewed Feb 23, 2026

View reviewed changes

Reuse ShreddingState methods

ccbf59b

sdf-jkl changed the title ~~[WIP] Support Shredded Lists/Array in variant_get~~ Support Shredded Lists/Array in variant_get Feb 24, 2026

klion26 reviewed Feb 25, 2026

View reviewed changes

sdf-jkl added 5 commits February 25, 2026 11:21

nit fix

cbfa058

use else if chain

cf94d43

add cast_options.safe docs

91589ad

Merge branch 'main' of https://github.com/apache/arrow-rs into shredd…

e8e7fb1

…ed_list_support

support list-like arrays

28ec53c

sdf-jkl marked this pull request as ready for review February 26, 2026 04:10

liamzwbao reviewed Mar 2, 2026

View reviewed changes

parquet-variant-compute/src/variant_get.rs Outdated Show resolved Hide resolved

match typed value instead of donwcast attempts

279b634

Conversation

sdf-jkl commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

scovich left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

scovich left a comment

Choose a reason for hiding this comment

Uh oh!

sdf-jkl commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

scovich commented Sep 19, 2025

Uh oh!

sdf-jkl commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sdf-jkl Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

sdf-jkl commented Sep 25, 2025

Uh oh!

scovich commented Sep 26, 2025

Uh oh!

sdf-jkl commented Feb 19, 2026

Uh oh!

sdf-jkl commented Feb 23, 2026

Uh oh!

scovich commented Feb 23, 2026

Uh oh!

sdf-jkl commented Feb 23, 2026

Uh oh!

scovich left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

scovich Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

sdf-jkl Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

klion26 Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

sdf-jkl Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

scovich Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

scovich Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

scovich Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

klion26 Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sdf-jkl commented Sep 16, 2025 •

edited

Loading

scovich left a comment •

edited

Loading

sdf-jkl commented Sep 19, 2025 •

edited

Loading

sdf-jkl commented Sep 23, 2025 •

edited

Loading

sdf-jkl Feb 25, 2026 •

edited

Loading

scovich Feb 27, 2026 •

edited

Loading