Commit d3bb4e9
Add pack and unpack of a more compact serialization format (#68)
* Store the original size argument in struct binary_fuse{8,16}_s
... in preparation of a more compact serialization format: All other
parameters except for the Seed are derived from the size parameter.
The drawback is that this format is sensitive to changes of
binary_fuse8_allocate().
Due to alignment, this does not need any more space on 64bit.
(There were 5 32bit values inbetween two 64bit values)
Yet formally, this is a breaking change of the in-core format, which
should not be used to store information across versions. See follow up
commits for new compact serialization formats.
* Add {xor,binary_fuse}{8,16}_{pack,unpack} serialization formats.
Rationale:
As mentioned in the previous commit, for binary_fuse filters, we do not
need to save values derived from the size, saving 5 x sizeof(uint32_t).
For both filter implementations, we add a bitmap to indicate non-zero
fingerprint values. This adds 1/{8,16} of the fingerprint array size,
but saves one or two bytes for each zero fingerprint.
The net result is a packed format which can not be compressed further by
zlib for the bundled unit tests.
Note that this format is incompatible with the existing _serialize()
format and, in the case of binary_fuse, sensitive to changes of the
derived parameters in _allocate.
Interface:
We add _pack_bytes() to match _serialization_bytes(). _pack() and
_unpack() match _serialize() and _deserialize().
The existing _{de,}serialize() interfaces take a buffer pointer only and
thus implicitly assume that the buffer will be of sufficient size. For
the new functions, we add a size_t parameter indicating the size of the
buffer and check its bounds in the implementation.
_pack returns the used size or zero for "does not fit", so when called
with a buffer of arbitrary size, the used space or error condition can
be determined without an additional call to _pack_bytes(), avoiding
duplicate work.
Implementation:
We add some XOR_bitf_* macros to address words and individual bits of
bitfields.
The XOR_ser and XOR_deser macros have the otherwise repeated code for
bounds checking and the actual serialization.
Because the implementations for the 8 and 16 bit words are equal except
for the data type, we add macros and create the actual functions by
expanding the macros with the possible data types.
Alternatives considered:
Compared to _{de,}serialize(), the new functions need to copy individual
fingerprint words rather than the whole array at once, which is less
efficient. Therefor, an implementation using Duff's Device with
branchless code was attempted but dismissed because avoiding
out-of-bounds access would require an over-allocated buffer.
* Adjust unit tests to new _{un,}pack() interface
To exercise the new code without too much of a change to the existing
unit test, we change the signature of _{un,}serialize_gen() to take an
additional (const) size_t argument, which we ignore for
_{un,}serialize().
We add to the reported metrics absolute and relative size information
for the "in-core" and "wire" format, the latter jointly referencing to
_{un,}serialize() and _{un,}pack().
* Document the new _{un,}pack() interface
* tuning the wording and adding a spaceusage benchmark
* changing the wording.
* rewording.
* explicit casts
---------
Co-authored-by: Daniel Lemire <[email protected]>1 parent 5539876 commit d3bb4e9
File tree
7 files changed
+498
-18
lines changed- benchmarks
- include
- tests
7 files changed
+498
-18
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
54 | 54 | | |
55 | 55 | | |
56 | 56 | | |
57 | | - | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
58 | 71 | | |
59 | 72 | | |
60 | 73 | | |
| |||
65 | 78 | | |
66 | 79 | | |
67 | 80 | | |
68 | | - | |
69 | | - | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
70 | 106 | | |
| 107 | + | |
| 108 | + | |
71 | 109 | | |
72 | 110 | | |
73 | 111 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
67 | 67 | | |
68 | 68 | | |
69 | 69 | | |
| 70 | + | |
70 | 71 | | |
71 | 72 | | |
72 | 73 | | |
| |||
222 | 223 | | |
223 | 224 | | |
224 | 225 | | |
| 226 | + | |
225 | 227 | | |
226 | 228 | | |
227 | 229 | | |
| |||
258 | 260 | | |
259 | 261 | | |
260 | 262 | | |
| 263 | + | |
261 | 264 | | |
262 | 265 | | |
263 | 266 | | |
| |||
459 | 462 | | |
460 | 463 | | |
461 | 464 | | |
| 465 | + | |
462 | 466 | | |
463 | 467 | | |
464 | 468 | | |
| |||
512 | 516 | | |
513 | 517 | | |
514 | 518 | | |
| 519 | + | |
515 | 520 | | |
516 | 521 | | |
517 | 522 | | |
| |||
548 | 553 | | |
549 | 554 | | |
550 | 555 | | |
| 556 | + | |
551 | 557 | | |
552 | 558 | | |
553 | 559 | | |
| |||
858 | 864 | | |
859 | 865 | | |
860 | 866 | | |
| 867 | + | |
| 868 | + | |
| 869 | + | |
| 870 | + | |
| 871 | + | |
| 872 | + | |
| 873 | + | |
| 874 | + | |
| 875 | + | |
| 876 | + | |
| 877 | + | |
| 878 | + | |
| 879 | + | |
| 880 | + | |
| 881 | + | |
| 882 | + | |
| 883 | + | |
| 884 | + | |
| 885 | + | |
| 886 | + | |
| 887 | + | |
| 888 | + | |
| 889 | + | |
| 890 | + | |
| 891 | + | |
| 892 | + | |
| 893 | + | |
| 894 | + | |
| 895 | + | |
| 896 | + | |
| 897 | + | |
| 898 | + | |
| 899 | + | |
| 900 | + | |
| 901 | + | |
| 902 | + | |
| 903 | + | |
| 904 | + | |
| 905 | + | |
| 906 | + | |
| 907 | + | |
| 908 | + | |
| 909 | + | |
| 910 | + | |
| 911 | + | |
| 912 | + | |
| 913 | + | |
| 914 | + | |
| 915 | + | |
| 916 | + | |
| 917 | + | |
| 918 | + | |
| 919 | + | |
| 920 | + | |
| 921 | + | |
| 922 | + | |
| 923 | + | |
| 924 | + | |
| 925 | + | |
| 926 | + | |
| 927 | + | |
| 928 | + | |
| 929 | + | |
| 930 | + | |
| 931 | + | |
| 932 | + | |
| 933 | + | |
| 934 | + | |
| 935 | + | |
| 936 | + | |
| 937 | + | |
| 938 | + | |
| 939 | + | |
| 940 | + | |
| 941 | + | |
| 942 | + | |
| 943 | + | |
| 944 | + | |
| 945 | + | |
| 946 | + | |
| 947 | + | |
| 948 | + | |
| 949 | + | |
| 950 | + | |
| 951 | + | |
| 952 | + | |
| 953 | + | |
| 954 | + | |
| 955 | + | |
| 956 | + | |
| 957 | + | |
| 958 | + | |
| 959 | + | |
| 960 | + | |
| 961 | + | |
| 962 | + | |
| 963 | + | |
| 964 | + | |
| 965 | + | |
| 966 | + | |
| 967 | + | |
| 968 | + | |
| 969 | + | |
| 970 | + | |
| 971 | + | |
| 972 | + | |
| 973 | + | |
861 | 974 | | |
0 commit comments