-
Notifications
You must be signed in to change notification settings - Fork 107
Description
Copied and pasted from Slack:
We have this documentation on DecimalArray:
/// These are just the maximal ranges for each scalar type, but it is perfectly legal to store
/// values with precision that does not match this exactly. For example, a valid DecimalArray with
/// precision=39 may store its values in an `i8` if all of the actual values fit into it.
///
/// Similarly, a `DecimalArray` can be built that stores a set of precision=2 values in a
/// `Buffer<i256>`.
This presents a problem: If we try to patch values or use fill_null with a decimal value that is valid according to the logical DType, but is not compatible with the storage DType, then we just panic!
This is highly concerning! The physical type has leaked and now governs what we are allowed to do instead of the logical type.
I see 3 ways forward:
- Keep this behavior. I think this is a bad idea, as it has completely different behavior from how PrimitiveArray works with BitPackedArray, for example.
- Reallocate the entire array whenever we run into this so that the new storage type can fit the new values. I also think this is a bad idea in terms of performance.
- We instead require that the storage type must always be able to fit any value in the domain of the logical DecimalDType.
For 3, by requiring this constraint we no longer need to worry about checking every single value during validation, we can just do a single check in the constructor that the dtype fits the storage.
Related issues (note that the null error message is incorrect, they are cast errors):
- Fuzzing Crash: Sum operation returns null for decimal256 array #5590
- Fuzzing Crash: fill_null on DecimalArray fails when fill value has null decimal_value #5744
Quote @gatesn:
I also think we should lean towards a variant of (3). At least, the DVector that a decimal array executes into should be the smallest physical type that holds the P/S. But you do still need to verify each value fits within the P/S. We will also need to be careful with back-compat and decimal arrays written to disk with narrower physical types than the P/S require.
Edit: I am now realizing this will be a much bigger change than I thought, because quite a lot of our existing code may have to change because it assumes we can just switch out the physical type whenever we want (an example is the sum compute function).