-
Notifications
You must be signed in to change notification settings - Fork 114
feat: teach ALPArray to store validity only in the encoded array #1951
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The patches are now always non-nullable. This required PrimitiveArray::patch to gracefully handle non-nullable patches when the array is nullable. I modified the benchmarks to include patch manipulation time, but notice that the test data has no patches. The benchmarks measure the overhead of `is_valid`. If we had test data where the invalid positions contained exceptional values, I would expect a modest improvement in both decompression and compression time. As discussed [in slack](https://spiraldb.slack.com/archives/C07BV3GKAJ2/p1736894376100079), the `is_valid` is expensive for two reasons: (a) creation of a Scalar and (b) conversion to an Arrow BooleanBuffer. @gatesn is working on a FilterMask change that should subsume all our boolean compression strategies into a new "BoolMask". With that in place, we can require Validity::Array to hold a "BoolMask" and thus check validity without either of the issues above. This PR ------- ``` Timer precision: 41 ns alp_compress fastest │ slowest │ median │ mean │ samples │ iters ├─ compress_alp │ │ │ │ │ │ ├─ f32 │ │ │ │ │ │ │ ├─ (100000, 0.25) 170.2 µs │ 442.1 µs │ 170.7 µs │ 173.6 µs │ 100 │ 100 │ │ ├─ (100000, 0.95) 169.3 µs │ 202.2 µs │ 170.4 µs │ 171.7 µs │ 100 │ 100 │ │ ├─ (100000, 1.0) 136.3 µs │ 147.1 µs │ 136.9 µs │ 137.2 µs │ 100 │ 100 │ │ ├─ (10000000, 0.25) 13.6 ms │ 16.46 ms │ 14.39 ms │ 14.23 ms │ 100 │ 100 │ │ ├─ (10000000, 0.95) 13.64 ms │ 17.7 ms │ 14.44 ms │ 14.38 ms │ 100 │ 100 │ │ ╰─ (10000000, 1.0) 13.55 ms │ 14.97 ms │ 14.35 ms │ 14.23 ms │ 100 │ 100 │ ╰─ f64 │ │ │ │ │ │ ├─ (100000, 0.25) 240.7 µs │ 385.7 µs │ 247.8 µs │ 249.1 µs │ 100 │ 100 │ ├─ (100000, 0.95) 240.2 µs │ 253.2 µs │ 243.5 µs │ 244.4 µs │ 100 │ 100 │ ├─ (100000, 1.0) 172.6 µs │ 184.4 µs │ 175.2 µs │ 175.5 µs │ 100 │ 100 │ ├─ (10000000, 0.25) 15.95 ms │ 24.24 ms │ 16.61 ms │ 17.25 ms │ 100 │ 100 │ ├─ (10000000, 0.95) 15.95 ms │ 21.34 ms │ 16.39 ms │ 16.85 ms │ 100 │ 100 │ ╰─ (10000000, 1.0) 15.92 ms │ 23.41 ms │ 16.46 ms │ 17.04 ms │ 100 │ 100 ╰─ decompress_alp │ │ │ │ │ ├─ f32 │ │ │ │ │ │ ├─ (100000, 0.25) 12.2 µs │ 34.7 µs │ 12.29 µs │ 12.52 µs │ 100 │ 100 │ ├─ (100000, 0.95) 12.12 µs │ 12.74 µs │ 12.35 µs │ 12.37 µs │ 100 │ 100 │ ├─ (100000, 1.0) 12.16 µs │ 12.95 µs │ 12.37 µs │ 12.4 µs │ 100 │ 100 │ ├─ (10000000, 0.25) 2.117 ms │ 4.544 ms │ 2.637 ms │ 2.674 ms │ 100 │ 100 │ ├─ (10000000, 0.95) 2.085 ms │ 4.458 ms │ 2.362 ms │ 2.504 ms │ 100 │ 100 │ ╰─ (10000000, 1.0) 2.097 ms │ 3.875 ms │ 2.229 ms │ 2.338 ms │ 100 │ 100 ╰─ f64 │ │ │ │ │ ├─ (100000, 0.25) 23.41 µs │ 25.16 µs │ 23.66 µs │ 23.68 µs │ 100 │ 100 ├─ (100000, 0.95) 23.2 µs │ 24.7 µs │ 24.06 µs │ 24.05 µs │ 100 │ 100 ├─ (100000, 1.0) 22.79 µs │ 26.08 µs │ 22.91 µs │ 22.95 µs │ 100 │ 100 ├─ (10000000, 0.25) 4.216 ms │ 6.862 ms │ 4.416 ms │ 4.568 ms │ 100 │ 100 ├─ (10000000, 0.95) 4.242 ms │ 7.647 ms │ 4.59 ms │ 4.827 ms │ 100 │ 100 ╰─ (10000000, 1.0) 4.236 ms │ 8.129 ms │ 4.377 ms │ 4.507 ms │ 100 │ 100 ``` Develop ------- I patched develop with this PR's benchmark code ``` Timer precision: 41 ns alp_compress fastest │ slowest │ median │ mean │ samples │ iters ├─ compress_alp │ │ │ │ │ │ ├─ f32 │ │ │ │ │ │ │ ├─ (100000, 0.25) 136 µs │ 279.7 µs │ 136.8 µs │ 138.5 µs │ 100 │ 100 │ │ ├─ (100000, 0.95) 136.2 µs │ 149.7 µs │ 136.9 µs │ 137.6 µs │ 100 │ 100 │ │ ├─ (100000, 1.0) 136 µs │ 148.5 µs │ 136.6 µs │ 137.1 µs │ 100 │ 100 │ │ ├─ (10000000, 0.25) 13.68 ms │ 15.06 ms │ 14.07 ms │ 14.14 ms │ 100 │ 100 │ │ ├─ (10000000, 0.95) 13.67 ms │ 19.4 ms │ 14.05 ms │ 14.18 ms │ 100 │ 100 │ │ ╰─ (10000000, 1.0) 13.66 ms │ 14.73 ms │ 13.87 ms │ 14.04 ms │ 100 │ 100 │ ╰─ f64 │ │ │ │ │ │ ├─ (100000, 0.25) 167.7 µs │ 301.4 µs │ 172.7 µs │ 173.2 µs │ 100 │ 100 │ ├─ (100000, 0.95) 167.7 µs │ 187.7 µs │ 170.7 µs │ 171.1 µs │ 100 │ 100 │ ├─ (100000, 1.0) 167.6 µs │ 183.7 µs │ 170.9 µs │ 171.1 µs │ 100 │ 100 │ ├─ (10000000, 0.25) 15.54 ms │ 21.9 ms │ 15.92 ms │ 16.05 ms │ 100 │ 100 │ ├─ (10000000, 0.95) 15.61 ms │ 16.74 ms │ 15.97 ms │ 16.03 ms │ 100 │ 100 │ ╰─ (10000000, 1.0) 15.64 ms │ 18.85 ms │ 16.1 ms │ 16.23 ms │ 100 │ 100 ╰─ decompress_alp │ │ │ │ │ ├─ f32 │ │ │ │ │ │ ├─ (100000, 0.25) 12.37 µs │ 85.49 µs │ 12.49 µs │ 13.22 µs │ 100 │ 100 │ ├─ (100000, 0.95) 12.29 µs │ 12.74 µs │ 12.43 µs │ 12.44 µs │ 100 │ 100 │ ├─ (100000, 1.0) 11.7 µs │ 11.95 µs │ 11.83 µs │ 11.81 µs │ 100 │ 100 │ ├─ (10000000, 0.25) 2.081 ms │ 3.003 ms │ 2.175 ms │ 2.263 ms │ 100 │ 100 │ ├─ (10000000, 0.95) 2.082 ms │ 2.228 ms │ 2.109 ms │ 2.124 ms │ 100 │ 100 │ ╰─ (10000000, 1.0) 2.082 ms │ 2.904 ms │ 2.13 ms │ 2.202 ms │ 100 │ 100 ╰─ f64 │ │ │ │ │ ├─ (100000, 0.25) 23.66 µs │ 25.66 µs │ 23.83 µs │ 23.86 µs │ 100 │ 100 ├─ (100000, 0.95) 23.16 µs │ 24.62 µs │ 24.04 µs │ 23.89 µs │ 100 │ 100 ├─ (100000, 1.0) 22.87 µs │ 23.49 µs │ 23.04 µs │ 23.03 µs │ 100 │ 100 ├─ (10000000, 0.25) 4.221 ms │ 5.59 ms │ 4.326 ms │ 4.469 ms │ 100 │ 100 ├─ (10000000, 0.95) 4.242 ms │ 5.319 ms │ 4.545 ms │ 4.536 ms │ 100 │ 100 ╰─ (10000000, 1.0) 4.228 ms │ 5.652 ms │ 4.342 ms │ 4.519 ms │ 100 │ 100 ```
encodings/alp/src/alp/array.rs
Outdated
| let mut children = Vec::with_capacity(2); | ||
| children.push(encoded); | ||
| if let Some(patches) = &patches { | ||
| if patches.dtype().is_nullable() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is the right way round. It's nice to have the patches dtype match the array dtype, and then assert that patches validity is AllValid.
encodings/alp/src/alp/mod.rs
Outdated
| } | ||
| } | ||
|
|
||
| fn find_best_exponents(values: &[Self], validity: &Validity) -> VortexResult<Exponents> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this take Validity?
I think we should assume, like with other non-nullable encodings, that the caller has inserted reasonable placeholder values.
For example, for BitPacking the placeholder should be zero. For RunEnd, the placeholder should be LastNonNull.
For ALP, I think the placeholder should also be zero. That way a null will never be an exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see three paths forward:
- ALPArray, at construction, zeros invalid values.
- ALPArray accepts arrays with invalid positions containing exceptional values. As a consequence either:
- ALPArray filters invalid positions from the patches.
PrimitiveArray::patchhas an option to ignore the patch validity.
This PR currently does 2.i, albeit it does the filtering in encode_chunk_unchecked which is rather deep in the logic of this function. I could move the filtering higher up the call stack, at the cost of filtering a vec. I suspect your retort is that the user should set zeros if they want it to be fast.
2.ii is not hard but I don't love that it complicates the Patches API.
1 is also fine. We maybe want an unsafe version of encode that assumes the user has already done this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pulled the filtering of the patches up to alp_encode_components in the latest commit d78ec6a . I agree the logic is easier to follow because the validity isn't mixed with the encoding work.
I think my instinct though is option 1. ALPArray has two constructors: the first zeros invalid values, the second just verifies that invalid values are zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, one thing we don't have a good convention for is what logic lives in the compressor vs in some sort of "encode" function. The problem is, everyone has their own options for that function so it's hard to write.
In this case, I think yes it's reasonable to have a version that zeros and a version that doesn't (basically just "fill_null(array, 0)"?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. We don't implement fill_null for primitive array and Arrow-rs doesn't have a fill_null AFAICT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not hard to add it? I didn't include it originally since we didn't have a usecase then
2c4d537 to
9dee362
Compare
9dee362 to
d78ec6a
Compare
Benchmarks: TPC-HTable of Results
|
Benchmarks: ClickbenchTable of Results
|
Benchmarks: datafusionTable of Results
|
Benchmarks: random_access |
Benchmarks: compressTable of Results
|
|
Huh. I was expecting a speed penalty due to verification of the validity of the patches array, but actually, it made some things quite a bit faster. I filtered to rows whose ratio is greater than 0.1 from 1. |
|
Wow, wtf, click bench? Again, filtering to things outside of [0.9, 1.1] |
|
You changed a lot of binary searches to a constant lookup? |
|
Compress isn't changed much but decompress is faster: |
|
@robert3005 Hmm. It could be that RunEnd with bool values now reports true count properly and that makes a lot of things fast? The other big benefit is for ALP arrays that have lots of invalid positions whose values are exceptions. I'm having a hard time imagining how that happens though? |
|
Runend compare now canonicalizes avoiding creating bool run end array |
|
The RunEnd true count and null count changes alone are not the cause of the click bench improvements: #1982 (comment) |
| && array.logical_validity().all_valid(), | ||
| )), | ||
| Stat::TrueCount => match array.dtype() { | ||
| DType::Bool(_) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you still need to subtract nulls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good catch, fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm gonna extract this to a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
Closing until #2007 merges. |
The patches are now always non-nullable.
This required PrimitiveArray::patch to gracefully handle non-nullable patches when the array is nullable.
I modified the benchmarks to include patch manipulation time, but notice that the test data has no patches. The benchmarks measure the overhead of
is_valid. If we had test data where the invalid positions contained exceptional values, I would expect a modest improvement in both decompression and compression time.As discussed in slack, the
is_validis expensive for two reasons: (a) creation of a Scalar and (b) conversion to an Arrow BooleanBuffer. @gatesn is working on a FilterMask change that should subsume all our boolean compression strategies into a new "BoolMask". With that in place, we can require Validity::Array to hold a "BoolMask" and thus check validity without either of the issues above.This PR
The parameters are: (number_of_array_elements, fraction_valid).
Develop
I patched develop with this PR's benchmark code