Commit 6ee7f7e
authored
### Rationale for this change
When building an `Arrow::Table` from a Ruby Hash passed to `Arrow::Table.new`, nested Integer arrays are incorrectly inferred as `list<uint8>` or `list<int8>` regardless of the actual values contained. Nested integer arrays should be correctly inferred as the appropriate list type (e.g., `list<int64>`, `list<uint64>`) based on their values, similar to how flat arrays are handled, unless they contain values out of range for any integer type.
### What changes are included in this PR?
This PR modifies the logic in `detect_builder_info()` to fix the inference issue. Specifically:
- **Persist `sub_builder_info` across sub-array elements**: Previously, `sub_builder_info` was recreated for each sub-array element in the Array. The logic has been updated to accumulate and carry over the builder information across elements to ensure correct type inference for the entire list.
- **Refactor Integer builder logic**: Following the pattern used for `BigDecimal`, the logic for determining the Integer builder has been moved to `create_builder()`. `detect_builder_info()` now calls this function.
**Note:**
- As a side effect of this refactoring, nested lists of `BigDecimal` (which were previously inferred as `string`) may now have their types inferred. However, comprehensive testing and verification for nested `BigDecimal` support will be addressed in a separate issue to keep this PR focused.
- We stopped using `IntArrayBuilder` for inference logic to ensure correctness. This results in a performance overhead (array building is approximately 2x slower) as we can no longer rely on the specialized builder's detection.
```text
user system total real
array_builder int32 100000 0.085867 0.000194 0.086061 ( 0.086369)
int_array_builder int32 100000 0.042163 0.001033 0.043196 ( 0.043268)
array_builder int64 100000 0.086799 0.000015 0.086814 ( 0.086828)
int_array_builder int64 100000 0.044493 0.000973 0.045466 ( 0.045469)
array_builder uint32 100000 0.085748 0.000009 0.085757 ( 0.085768)
int_array_builder uint32 100000 0.044463 0.001034 0.045497 ( 0.045498)
array_builder uint64 100000 0.084548 0.000987 0.085535 ( 0.085537)
int_array_builder uint64 100000 0.044206 0.000017 0.044223 ( 0.044225)
```
### Are these changes tested?
Yes. `ruby ruby/red-arrow/test/run-test.rb`
### Are there any user-facing changes?
Yes.
* GitHub Issue: #48481
Authored-by: hypsakata <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
1 parent 582d99c commit 6ee7f7e
File tree
2 files changed
+443
-44
lines changed- ruby/red-arrow
- lib/arrow
- test
2 files changed
+443
-44
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
74 | 74 | | |
75 | 75 | | |
76 | 76 | | |
77 | | - | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
78 | 84 | | |
79 | | - | |
80 | | - | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
81 | 88 | | |
82 | 89 | | |
83 | 90 | | |
84 | | - | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
85 | 94 | | |
86 | 95 | | |
87 | 96 | | |
| |||
150 | 159 | | |
151 | 160 | | |
152 | 161 | | |
153 | | - | |
| 162 | + | |
154 | 163 | | |
155 | 164 | | |
156 | 165 | | |
157 | 166 | | |
158 | 167 | | |
159 | | - | |
160 | | - | |
| 168 | + | |
| 169 | + | |
161 | 170 | | |
162 | 171 | | |
163 | 172 | | |
164 | 173 | | |
| 174 | + | |
165 | 175 | | |
166 | 176 | | |
167 | 177 | | |
| |||
186 | 196 | | |
187 | 197 | | |
188 | 198 | | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
189 | 228 | | |
190 | 229 | | |
191 | 230 | | |
| |||
0 commit comments