Commit cbbe407
authored
Tracking Issue: #4699
Closes: #5053
This PR introduces a performance optimization for `ListViewArray` by
adding an `is_zero_copy_to_list` flag that tracks whether the
`ListViewArray` can be (near) constant-time converted to a `ListArray`
without copying data.
When this flag is true (indicating sorted offsets with no gaps and no
overlaps), conversions can bypass the very expensive rebuild process
(which just calls `append_scalar` in a loop).
Additionally, this PR refactors all `ListViewArray` constructor call
sites and compute operations throughout the codebase to properly
maintain this new invariant. The only time you can set this flag is via
the `new_unchecked` constructor which is unsafe, which means the caller
better be careful to know what they are doing.
Some things that are not included in this PR:
- Making conversion _actually_ zero-copy would require implementing
logic to always allocate an additional `offsets` slot for an `n+1`th
offset in `ListViewArray`. This is because `ListArray` has `n+1`
offsets, and from a plain `ListViewArray` we have to reallocate the
offsets to make space.
- `reset_offsets` right now is probably more expensive than it could be
since it uses the `sub_scalar` compute function. We know in advance that
`reset_offsets` should _never_ fail in the non-recursive case.
- Validation of this `is_zctl` flag is **SUPER** expensive, but it can
be made a bit less expensive by using a `BitBuffer` to track references
+ a flag for overlaps.
- It would probably be nice to have an `check_is_zero_copy_to_list`
function on `ListViewArray` that sets the flag if it detects it is
zero-copyable to a `ListArray`. This would be nice when we are reading
from disk or converting from an Arrow `GenericListView`.
- This PR **does not** change the on-disk format, which means we lose
this metadata when writing to disk. I think that this flag is somewhat
similar to the `is_sorted` statistic, but at the same time I feel like
it is not correct to store this as a statistic? Not sure.
---------
Signed-off-by: Connor Tsui <[email protected]>
1 parent 7634f36 commit cbbe407
File tree
27 files changed
+1024
-619
lines changed- encodings/sparse/src
- fuzz/src/array
- vortex-array/src
- arrays
- chunked/vtable
- listview
- compute
- tests
- vtable
- list
- arrow/compute/to_arrow
- builders
- compute
- vortex-btrblocks/src
- vortex-duckdb/src/exporter
- vortex-layout/src/layouts
27 files changed
+1024
-619
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
931 | 931 | | |
932 | 932 | | |
933 | 933 | | |
934 | | - | |
935 | | - | |
936 | | - | |
| 934 | + | |
| 935 | + | |
| 936 | + | |
| 937 | + | |
| 938 | + | |
937 | 939 | | |
938 | 940 | | |
939 | 941 | | |
| |||
978 | 980 | | |
979 | 981 | | |
980 | 982 | | |
981 | | - | |
982 | | - | |
983 | | - | |
| 983 | + | |
| 984 | + | |
| 985 | + | |
| 986 | + | |
| 987 | + | |
984 | 988 | | |
985 | 989 | | |
986 | 990 | | |
| |||
1022 | 1026 | | |
1023 | 1027 | | |
1024 | 1028 | | |
1025 | | - | |
1026 | | - | |
1027 | | - | |
| 1029 | + | |
| 1030 | + | |
| 1031 | + | |
| 1032 | + | |
| 1033 | + | |
1028 | 1034 | | |
1029 | 1035 | | |
1030 | 1036 | | |
| |||
1384 | 1390 | | |
1385 | 1391 | | |
1386 | 1392 | | |
1387 | | - | |
1388 | | - | |
| 1393 | + | |
| 1394 | + | |
| 1395 | + | |
| 1396 | + | |
1389 | 1397 | | |
1390 | 1398 | | |
1391 | 1399 | | |
| |||
1460 | 1468 | | |
1461 | 1469 | | |
1462 | 1470 | | |
1463 | | - | |
1464 | | - | |
1465 | | - | |
| 1471 | + | |
| 1472 | + | |
| 1473 | + | |
| 1474 | + | |
| 1475 | + | |
1466 | 1476 | | |
1467 | 1477 | | |
1468 | 1478 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
57 | 57 | | |
58 | 58 | | |
59 | 59 | | |
60 | | - | |
61 | | - | |
62 | | - | |
63 | | - | |
64 | | - | |
65 | | - | |
66 | | - | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
67 | 72 | | |
68 | 73 | | |
69 | 74 | | |
| |||
243 | 248 | | |
244 | 249 | | |
245 | 250 | | |
246 | | - | |
247 | | - | |
248 | | - | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
249 | 255 | | |
250 | 256 | | |
251 | 257 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
74 | 74 | | |
75 | 75 | | |
76 | 76 | | |
77 | | - | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
78 | 85 | | |
79 | 86 | | |
80 | 87 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
8 | | - | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
9 | 11 | | |
10 | 12 | | |
11 | 13 | | |
| |||
121 | 123 | | |
122 | 124 | | |
123 | 125 | | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
124 | 129 | | |
125 | 130 | | |
126 | 131 | | |
| |||
163 | 168 | | |
164 | 169 | | |
165 | 170 | | |
166 | | - | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
167 | 177 | | |
168 | 178 | | |
169 | 179 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
7 | | - | |
| 7 | + | |
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
| |||
191 | 191 | | |
192 | 192 | | |
193 | 193 | | |
194 | | - | |
195 | | - | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
196 | 199 | | |
197 | 200 | | |
198 | 201 | | |
| |||
266 | 269 | | |
267 | 270 | | |
268 | 271 | | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
269 | 277 | | |
270 | 278 | | |
271 | 279 | | |
| |||
0 commit comments