Decimal logical vs physical types

Copied and pasted from Slack:

---

We have this documentation on DecimalArray:

```rust
/// These are just the maximal ranges for each scalar type, but it is perfectly legal to store
/// values with precision that does not match this exactly. For example, a valid DecimalArray with
/// precision=39 may store its values in an `i8` if all of the actual values fit into it.
///
/// Similarly, a `DecimalArray` can be built that stores a set of precision=2 values in a
/// `Buffer<i256>`.
```

This presents a problem: If we try to patch values or use fill_null with a decimal value that is valid according to the logical DType, but is not compatible with the storage DType, then we just panic!

This is highly concerning! The physical type has leaked and now governs what we are allowed to do instead of the logical type.

I see 3 ways forward:
1. Keep this behavior. I think this is a bad idea, as it has completely different behavior from how PrimitiveArray works with BitPackedArray, for example.
2. Reallocate the entire array whenever we run into this so that the new storage type can fit the new values. I also think this is a bad idea in terms of performance.
3. We instead require that the storage type must always be able to fit any value in the domain of the logical DecimalDType. 

For 3, by requiring this constraint we no longer need to worry about checking every single value during validation, we can just do a single check in the constructor that the dtype fits the storage.

---

Related issues (note that the null error message is incorrect, they are cast errors):

- https://github.com/vortex-data/vortex/issues/5590
- https://github.com/vortex-data/vortex/issues/5744

Quote @gatesn:

> I also think we should lean towards a variant of (3). At least, the DVector that a decimal array executes into should be the smallest physical type that holds the P/S. But you do still need to verify each value fits within the P/S. We will also need to be careful with back-compat and decimal arrays written to disk with narrower physical types than the P/S require.

Edit: I am now realizing this will be a much bigger change than I thought, because quite a lot of our existing code may have to change because it assumes we can just switch out the physical type whenever we want (an example is the `sum` compute function).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Decimal logical vs physical types #5780

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Decimal logical vs physical types #5780

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions