Skip to content

Commit 6ee7f7e

Browse files
authored
GH-48481: [Ruby] Correctly infer types for nested integer arrays (#48699)
### Rationale for this change When building an `Arrow::Table` from a Ruby Hash passed to `Arrow::Table.new`, nested Integer arrays are incorrectly inferred as `list<uint8>` or `list<int8>` regardless of the actual values contained. Nested integer arrays should be correctly inferred as the appropriate list type (e.g., `list<int64>`, `list<uint64>`) based on their values, similar to how flat arrays are handled, unless they contain values out of range for any integer type. ### What changes are included in this PR? This PR modifies the logic in `detect_builder_info()` to fix the inference issue. Specifically: - **Persist `sub_builder_info` across sub-array elements**: Previously, `sub_builder_info` was recreated for each sub-array element in the Array. The logic has been updated to accumulate and carry over the builder information across elements to ensure correct type inference for the entire list. - **Refactor Integer builder logic**: Following the pattern used for `BigDecimal`, the logic for determining the Integer builder has been moved to `create_builder()`. `detect_builder_info()` now calls this function. **Note:** - As a side effect of this refactoring, nested lists of `BigDecimal` (which were previously inferred as `string`) may now have their types inferred. However, comprehensive testing and verification for nested `BigDecimal` support will be addressed in a separate issue to keep this PR focused. - We stopped using `IntArrayBuilder` for inference logic to ensure correctness. This results in a performance overhead (array building is approximately 2x slower) as we can no longer rely on the specialized builder's detection. ```text user system total real array_builder int32 100000 0.085867 0.000194 0.086061 ( 0.086369) int_array_builder int32 100000 0.042163 0.001033 0.043196 ( 0.043268) array_builder int64 100000 0.086799 0.000015 0.086814 ( 0.086828) int_array_builder int64 100000 0.044493 0.000973 0.045466 ( 0.045469) array_builder uint32 100000 0.085748 0.000009 0.085757 ( 0.085768) int_array_builder uint32 100000 0.044463 0.001034 0.045497 ( 0.045498) array_builder uint64 100000 0.084548 0.000987 0.085535 ( 0.085537) int_array_builder uint64 100000 0.044206 0.000017 0.044223 ( 0.044225) ``` ### Are these changes tested? Yes. `ruby ruby/red-arrow/test/run-test.rb` ### Are there any user-facing changes? Yes. * GitHub Issue: #48481 Authored-by: hypsakata <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
1 parent 582d99c commit 6ee7f7e

File tree

2 files changed

+443
-44
lines changed

2 files changed

+443
-44
lines changed

ruby/red-arrow/lib/arrow/array-builder.rb

Lines changed: 46 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -74,14 +74,23 @@ def detect_builder_info(value, builder_info)
7474
detected: true,
7575
}
7676
when Integer
77-
if value < 0
77+
builder_info ||= {}
78+
min = builder_info[:min] || value
79+
max = builder_info[:max] || value
80+
min = value if value < min
81+
max = value if value > max
82+
83+
if builder_info[:builder_type] == :int || value < 0
7884
{
79-
builder: IntArrayBuilder.new,
80-
detected: true,
85+
builder_type: :int,
86+
min: min,
87+
max: max,
8188
}
8289
else
8390
{
84-
builder: UIntArrayBuilder.new,
91+
builder_type: :uint,
92+
min: min,
93+
max: max,
8594
}
8695
end
8796
when Time
@@ -150,18 +159,19 @@ def detect_builder_info(value, builder_info)
150159
end
151160
end
152161
when ::Array
153-
sub_builder_info = nil
162+
sub_builder_info = builder_info && builder_info[:value_builder_info]
154163
value.each do |sub_value|
155164
sub_builder_info = detect_builder_info(sub_value, sub_builder_info)
156165
break if sub_builder_info and sub_builder_info[:detected]
157166
end
158167
if sub_builder_info
159-
sub_builder = sub_builder_info[:builder]
160-
return builder_info unless sub_builder
168+
sub_builder = sub_builder_info[:builder] || create_builder(sub_builder_info)
169+
return sub_builder_info unless sub_builder
161170
sub_value_data_type = sub_builder.value_data_type
162171
field = Field.new("item", sub_value_data_type)
163172
{
164173
builder: ListArrayBuilder.new(ListDataType.new(field)),
174+
value_builder_info: sub_builder_info,
165175
detected: sub_builder_info[:detected],
166176
}
167177
else
@@ -186,6 +196,35 @@ def create_builder(builder_info)
186196
data_type = Decimal256DataType.new(builder_info[:precision],
187197
builder_info[:scale])
188198
Decimal256ArrayBuilder.new(data_type)
199+
when :int
200+
min = builder_info[:min]
201+
max = builder_info[:max]
202+
203+
if GLib::MININT8 <= min && max <= GLib::MAXINT8
204+
Int8ArrayBuilder.new
205+
elsif GLib::MININT16 <= min && max <= GLib::MAXINT16
206+
Int16ArrayBuilder.new
207+
elsif GLib::MININT32 <= min && max <= GLib::MAXINT32
208+
Int32ArrayBuilder.new
209+
elsif GLib::MININT64 <= min && max <= GLib::MAXINT64
210+
Int64ArrayBuilder.new
211+
else
212+
StringArrayBuilder.new
213+
end
214+
when :uint
215+
max = builder_info[:max]
216+
217+
if max <= GLib::MAXUINT8
218+
UInt8ArrayBuilder.new
219+
elsif max <= GLib::MAXUINT16
220+
UInt16ArrayBuilder.new
221+
elsif max <= GLib::MAXUINT32
222+
UInt32ArrayBuilder.new
223+
elsif max <= GLib::MAXUINT64
224+
UInt64ArrayBuilder.new
225+
else
226+
StringArrayBuilder.new
227+
end
189228
else
190229
nil
191230
end

0 commit comments

Comments
 (0)