Skip to content

GH-48478: [Ruby] Fix Ruby list inference for nested non-negative integer arrays#48584

Merged
kou merged 5 commits intoapache:mainfrom
hypsakata:gh-48478
Dec 20, 2025
Merged

GH-48478: [Ruby] Fix Ruby list inference for nested non-negative integer arrays#48584
kou merged 5 commits intoapache:mainfrom
hypsakata:gh-48478

Conversation

@hypsakata
Copy link
Contributor

@hypsakata hypsakata commented Dec 18, 2025

Rationale for this change

When building an Arrow::Table from a Ruby Hash passed to Arrow::Table.new, nested Integer arrays are incorrectly inferred as string (utf8) if all values are non-negative. This behavior is unexpected; nested integer arrays should be consistently represented as a list type (e.g., list<item: uint*> or list<item: int*>) rather than falling back to UTF-8 strings.

What changes are included in this PR?

This PR modifies the logic in detect_builder_info(), specifically the when ::Array block, to correctly identify nested non-negative integer arrays as list arrays.

The change ensures that if sub_builder_info contains a valid :builder, it will be used even if sub_builder_info does not yet indicate that the type has been "detected."

Are these changes tested?

Yes. (ruby ruby/red-arrow/test/run-test.rb)

Are there any user-facing changes?

Yes.

GitHub Issue: Closes #48478

@hypsakata hypsakata requested a review from kou as a code owner December 18, 2025 09:46
@github-actions
Copy link

⚠️ GitHub issue #48478 has been automatically assigned in GitHub to PR creator.

@kou
Copy link
Member

kou commented Dec 19, 2025

Could you add tests for newly supported cases?

@hypsakata
Copy link
Contributor Author

I have added test cases for nested array type inference in test/unit/apache-arrow/test-table.rb.
Since I couldn't find existing tests specifically covering this input pattern (Hash with nested arrays), I added them to the .new test block.

Comment on lines +75 to +86

test("{Symbol: nested non-negative integer Array}") do
table = Arrow::Table.new(numbers: [[0, 1, 2], [3, 4]])
assert_equal("list<item: uint8>",
table.schema["numbers"].data_type.to_s)
end

test("{Symbol: nested signed integer Array}") do
table = Arrow::Table.new(numbers: [[0, -1, 2], [3, 4]])
assert_equal("list<item: int8>",
table.schema["numbers"].data_type.to_s)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using

test("list<boolean>s") do
assert_build(Arrow::ArrayBuilder,
[
[nil, true, false],
nil,
[false],
])
end
test("list<string>s") do
assert_build(Arrow::ArrayBuilder,
[
["Hello", "World"],
["Apache Arrow"],
])
end
instead of this file?

{
builder: ListArrayBuilder.new(ListDataType.new(field)),
detected: true,
detected: !!sub_builder_info[:detected],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this !! for converting nil to false?

If so, we don't need it because nil is also a false value.

@hypsakata
Copy link
Contributor Author

Thanks for the review!

I've removed the redundant boolean conversion.
Also, I've moved the tests to test/test-array-builder.rb. I should have checked the appropriate location more carefully, so thanks for pointing it out.

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Comment on lines +151 to +152
array = Arrow::ArrayBuilder.build([[0, 1, 2], [3, 4]])
assert_equal("list<item: uint8>", array.value_data_type.to_s)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also check values like

assert_equal({
data_type: data_type,
values: [
BigDecimal("10.1"),
BigDecimal("1.11"),
BigDecimal("1"),
],
},
{
data_type: array.value_data_type,
values: array.to_a,
})
?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've updated the tests.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Dec 20, 2025
@kou kou merged commit d89c14b into apache:main Dec 20, 2025
10 checks passed
@kou kou removed the awaiting committer review Awaiting committer review label Dec 20, 2025
Jonahkel pushed a commit to Jonahkel/arrow that referenced this pull request Dec 22, 2025
…e integer arrays (apache#48584)

### Rationale for this change 

When building an `Arrow::Table` from a Ruby Hash passed to `Arrow::Table.new`, nested `Integer` arrays are incorrectly inferred as `string` (utf8) if all values are non-negative. This behavior is unexpected; nested integer arrays should be consistently represented as a list type (e.g., `list<item: uint*>` or `list<item: int*>`) rather than falling back to UTF-8 strings. 

### What changes are included in this PR? 

This PR modifies the logic in `detect_builder_info()`, specifically the `when ::Array` block, to correctly identify nested non-negative integer arrays as list arrays. 

The change ensures that if `sub_builder_info` contains a valid `:builder`, it will be used even if `sub_builder_info` does not yet indicate that the type has been "detected."

### Are these changes tested?

Yes. (`ruby ruby/red-arrow/test/run-test.rb`)

### Are there any user-facing changes?

Yes.

GitHub Issue: Closes apache#48478 
* GitHub Issue: apache#48478

Authored-by: hypsakata <46911464+hypsakata@users.noreply.github.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit d89c14b.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 1 possible false positive for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Ruby] Arrow::Table.new infers nested integer arrays as utf8 when all values are non-negative

2 participants