GH-48478: [Ruby] Fix Ruby list inference for nested non-negative integer arrays#48584
GH-48478: [Ruby] Fix Ruby list inference for nested non-negative integer arrays#48584kou merged 5 commits intoapache:mainfrom
Conversation
|
|
|
Could you add tests for newly supported cases? |
|
I have added test cases for nested array type inference in |
ruby/red-arrow/test/test-table.rb
Outdated
|
|
||
| test("{Symbol: nested non-negative integer Array}") do | ||
| table = Arrow::Table.new(numbers: [[0, 1, 2], [3, 4]]) | ||
| assert_equal("list<item: uint8>", | ||
| table.schema["numbers"].data_type.to_s) | ||
| end | ||
|
|
||
| test("{Symbol: nested signed integer Array}") do | ||
| table = Arrow::Table.new(numbers: [[0, -1, 2], [3, 4]]) | ||
| assert_equal("list<item: int8>", | ||
| table.schema["numbers"].data_type.to_s) | ||
| end |
There was a problem hiding this comment.
How about using
arrow/ruby/red-arrow/test/test-array-builder.rb
Lines 133 to 148 in 649a372
| { | ||
| builder: ListArrayBuilder.new(ListDataType.new(field)), | ||
| detected: true, | ||
| detected: !!sub_builder_info[:detected], |
There was a problem hiding this comment.
Is this !! for converting nil to false?
If so, we don't need it because nil is also a false value.
|
Thanks for the review! I've removed the redundant boolean conversion. |
| array = Arrow::ArrayBuilder.build([[0, 1, 2], [3, 4]]) | ||
| assert_equal("list<item: uint8>", array.value_data_type.to_s) |
There was a problem hiding this comment.
Could you also check values like
arrow/ruby/red-arrow/test/test-array-builder.rb
Lines 97 to 108 in 1fac131
There was a problem hiding this comment.
Thanks, I've updated the tests.
…e integer arrays (apache#48584) ### Rationale for this change When building an `Arrow::Table` from a Ruby Hash passed to `Arrow::Table.new`, nested `Integer` arrays are incorrectly inferred as `string` (utf8) if all values are non-negative. This behavior is unexpected; nested integer arrays should be consistently represented as a list type (e.g., `list<item: uint*>` or `list<item: int*>`) rather than falling back to UTF-8 strings. ### What changes are included in this PR? This PR modifies the logic in `detect_builder_info()`, specifically the `when ::Array` block, to correctly identify nested non-negative integer arrays as list arrays. The change ensures that if `sub_builder_info` contains a valid `:builder`, it will be used even if `sub_builder_info` does not yet indicate that the type has been "detected." ### Are these changes tested? Yes. (`ruby ruby/red-arrow/test/run-test.rb`) ### Are there any user-facing changes? Yes. GitHub Issue: Closes apache#48478 * GitHub Issue: apache#48478 Authored-by: hypsakata <46911464+hypsakata@users.noreply.github.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
|
After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit d89c14b. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 1 possible false positive for unstable benchmarks that are known to sometimes produce them. |
Rationale for this change
When building an
Arrow::Tablefrom a Ruby Hash passed toArrow::Table.new, nestedIntegerarrays are incorrectly inferred asstring(utf8) if all values are non-negative. This behavior is unexpected; nested integer arrays should be consistently represented as a list type (e.g.,list<item: uint*>orlist<item: int*>) rather than falling back to UTF-8 strings.What changes are included in this PR?
This PR modifies the logic in
detect_builder_info(), specifically thewhen ::Arrayblock, to correctly identify nested non-negative integer arrays as list arrays.The change ensures that if
sub_builder_infocontains a valid:builder, it will be used even ifsub_builder_infodoes not yet indicate that the type has been "detected."Are these changes tested?
Yes. (
ruby ruby/red-arrow/test/run-test.rb)Are there any user-facing changes?
Yes.
GitHub Issue: Closes #48478