MemCS bloom aggregate


Since: 3.6

MemCS primary index was populated with `bloom` aggregates. This type of
aggregates allows to use data-skipping base on bloom filter for
requests with equality filters. Also, this aggregate has a tunable `fpr`
parameter - false-positive rate of undrelying bloom filter. It must be
in (0..1) range. The higher fpr, the lower memory consumption. The
default value is 0.05 (5%).

Note that `bloom` aggregates support all fixed-size types and `string`
type (`minmax` supports only fied-size types).

Example:
```lua
local s = box.schema.create_space('test', {
    engine = 'memcs', field_count = 4,
    format = {{'a', 'uint64'}, {'b', 'uint64'}, {'c', 'uint64'},
	      {'d', 'string'}},
})
s:create_index('pk', {aggregates = {
    {type = 'bloom', field = 2, name = 'bloom_2', fpr = 0.1},
    {type = 'bloom', field = 3, name = 'bloom_3', fpr = 0.01},
    {type = 'bloom', field = 4, name = 'bloom_4'},
}})
```

Then filter with equality condition will automatically use bloom
aggregates, if any:
```c
/* Create arrow stream options. */
box_arrow_options_t *options = box_arrow_options_new();

/*
 * Set filter `[2] = 42` so some rows with `[2] != 42` can be skipped.
 */
box_filter_t filter;
filter->type = FILTER_TYPE_EQ;
filter->field_no = 1; /* 0-indexation. */

char buf[16];
mp_encode_uint(buf, 42);
filter->value = buf;

box_arrow_options_set_filter(options, &filter);

/* Create stream. */
struct ArrowArrayStream stream;
int rc = box_index_arrow_stream(space_id, index_id, field_count, fields,
				key, key + key_size, options, &stream);
```

Regarding memory consumption, it's the same for all types - only `fpr`
parameter matters. Here are some memory consumption measurements:
* `fpr = 0.01` - 10368 bytes consumed for each block.
* `fpr = 0.05` (default value) - 7424 bytes consumed for each block.
* `fpr = 0.1` - 5952 bytes consumed for each block.
* `fpr = 0.5` - 1536 bytes consumed for each block.
Any `fpr` higher than `0.5` has the same effect as `fpr = 0.5`.
Requested by @drewdzzz in https://github.com/tarantool/tarantool-ee/commit/e6cb3bfe8bdd6576a5f94f7aaa5aff0f877346f1.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MemCS bloom aggregate #5500

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

MemCS bloom aggregate #5500

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions