Add fuzzer for roaring_buffer_reader#42
Conversation
|
How about rename Makefile.fuzz to Makefile, while it's the only build file in the directory |
Makes sense — fixed in 8f5cf6c 🙂 |
|
As described in #45, the roaring_buffer_reader relies on offsets for deserialization. When the offsets in the input serialized data are incorrect, it cannot produce the same output as roaring_bitmap_portable_deserialize_safe(which might skip offset parsing). If it were also modified to not rely on offsets for deserialization, there would be a significant performance degradation, so this discrepancy can only be ignored. |
|
After including the fix from #45, as well as a call to Note that the issue around offsets also apply for cardinalities: I'll open a separate PR to add |
This PR adds a fuzzer based on libfuzzer for
roaring_buffer_reader. You can try it as follows (the fuzzer will run up to 60s):The intent of the fuzzer is to cover whether the custom deserialization and operations on roaring bitmaps implemented in
roaring_buffer_reader.cis safe and correct, in the sense that they compute the same results ascroaring.The fuzzer works as follows:
roaring_bitmap_portable_deserialize_safe. Stop if deserialization fails.roaring_buffer_create.croaring:roaring_buffer_get_cardinalityroaring_buffer_containsroaring_buffer_is_subsetroaring_buffer_androaring_buffer_andnotroaring_buffer_and_cardinalityroaring_buffer_or_cardinalityroaring_buffer_andnot_cardinalityroaring_buffer_xor_cardinalityroaring_buffer_jaccard_indexroaring_buffer_intersectroaring_buffer_is_emptyroaring_buffer_equalsroaring_buffer_rankroaring_buffer_minimumroaring_buffer_maximumIf you run the fuzzer, it pretty quickly finds that
roaring_buffer_trelies on the offset header from the serialized bitmap, which is not validated byroaring_bitmap_portable_deserialize_safe. This can cause the roaring buffer functions to return wrong results.Here's an example of that found by the fuzzer replicated in
psql:A proposed fix could be to not rely on the offset reader for untrusted input, but rather parse the containers instead (the
croaringlibrary always does this).If there's support for this change, as well as #40, then we can set up fuzzing as a Github Action (see https://google.github.io/clusterfuzzlite/running-clusterfuzzlite/github-actions/).