Skip to content

Commit df74fdb

Browse files
Add specification for double delta filter encoding. (#5137)
[SC-49776](https://app.shortcut.com/tiledb-inc/story/49776) This PR adds the encoding of the double delta filter to the filter specification. The specification was based on the filter class' [documentation](https://github.com/TileDB-Inc/TileDB/blob/6a9918a6caf33967dc3201016ecb51eb62a45d3c/tiledb/sm/compressors/dd_compressor.h#L61-L102) and [implementation](https://github.com/TileDB-Inc/TileDB/blob/6a9918a6caf33967dc3201016ecb51eb62a45d3c/tiledb/sm/compressors/dd_compressor.cc#L209-L257). --- TYPE: FORMAT DESC: Added specification for the encoding of the double delta filter.
1 parent 2c859c9 commit df74fdb

File tree

1 file changed

+32
-0
lines changed

1 file changed

+32
-0
lines changed
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
---
2+
title: Double Delta Filter
3+
---
4+
5+
## Double Delta Filter
6+
7+
The double delta filter compresses integer type data by first computing the delta between consecutive elements, then the delta between the deltas, and bit-packing the result.
8+
9+
### Filter Enum Value
10+
11+
The filter enum value for the double delta filter is `6` (`TILEDB_FILTER_DOUBLE_DELTA` enum).
12+
13+
### Input and Output Layout
14+
15+
The input data layout will be an array of integer numbers (each known as `in_{n}`, with `n` starting from 0). Their type (henceforth known as `input_t`) is inferred from the output type of the previous filter, or the tile's datatype if this is the first filter in the pipeline, but can be overriden by the [_Reinterpret datatype_ field](../filter_pipeline.md#delta-compressor-options) in the filter options.
16+
17+
The output data layout consists of the following fields:
18+
19+
|Field|Type|Description|
20+
|:---|:---|:---|
21+
|`bitsize`|`uint8_t`|Minimum number of bits required to represent any `dd_n` value.|
22+
|`n`|`uint64_t`|Number of values in the input data.|
23+
|`in_0`|`input_t`|First input value.|
24+
|`in_1`|`input_t`|Second input value.|
25+
|`sign_2`|`bit`|Sign of `(in_2 - in_1) - (in_1 - in_0)`.|
26+
|`dd_2`|`bit[bitsize]`|Absolute value of `(in_2 - in_1) - (in_1 - in_0)`.|
27+
||||
28+
|`sign_n`|`bit`|Sign of `(in_n - in_{n - 1}) - (in_{n - 1} - in_{n - 2})`.|
29+
|`dd_n`|`bit[bitsize]`|Absolute value of `(in_n - in_{n - 1}) - (in_{n - 1} - in_{n - 2})`.|
30+
|`pad`|`bit[((n - 2) * (bitsize + 1)) % 64]`|Padding to the next 64-bit boundary.|
31+
32+
If `bitsize` was computed as equal to `sizeof(input_t) * 8 - 1` (i.e. double delta compression would not have yielded any size savings), double delta compression is not applied and the input data will be added to the output stream unchanged, after `bitsize` and `n`, which are always written.

0 commit comments

Comments
 (0)