Skip to content

Commit 2895f48

Browse files
author
Alexei Starovoitov
committed
Merge branch 'Implement bloom filter map'
Joanne Koong says: ==================== This patchset adds a new kind of bpf map: the bloom filter map. Bloom filters are a space-efficient probabilistic data structure used to quickly test whether an element exists in a set. For a brief overview about how bloom filters work, https://en.wikipedia.org/wiki/Bloom_filter may be helpful. One example use-case is an application leveraging a bloom filter map to determine whether a computationally expensive hashmap lookup can be avoided. If the element was not found in the bloom filter map, the hashmap lookup can be skipped. This patchset includes benchmarks for testing the performance of the bloom filter for different entry sizes and different number of hash functions used, as well as comparisons for hashmap lookups with vs. without the bloom filter. A high level overview of this patchset is as follows: 1/5 - kernel changes for adding bloom filter map 2/5 - libbpf changes for adding map_extra flags 3/5 - tests for the bloom filter map 4/5 - benchmarks for bloom filter lookup/update throughput and false positive rate 5/5 - benchmarks for how hashmap lookups perform with vs. without the bloom filter v5 -> v6: * in 1/5: remove "inline" from the hash function, add check in syscall to fail out in cases where map_extra is not 0 for non-bloom-filter maps, fix alignment matching issues, move "map_extra flags" comments to inside the bpf_attr struct, add bpf_map_info map_extra changes here, add map_extra assignment in bpf_map_get_info_by_fd, change hash value_size to u32 instead of a u64 * in 2/5: remove bpf_map_info map_extra changes, remove TODO comment about extending BTF arrays to cover u64s, cast to unsigned long long for %llx when printing out map_extra flags * in 3/5: use __type(value, ...) instead of __uint(value_size, ...) for values and keys * in 4/5: fix wrong bounds for the index when iterating through random values, update commit message to include update+lookup benchmark results for 8 byte and 64-byte value sizes, remove explicit global bool initializaton to false for hashmap_use_bloom and count_false_hits variables v4 -> v5: * Change the "bitset map with bloom filter capabilities" to a bloom filter map with max_entries signifying the number of unique entries expected in the bloom filter, remove bitset tests * Reduce verbiage by changing "bloom_filter" to "bloom", and renaming progs to more concise names. * in 2/5: remove "map_extra" from struct definitions that are frozen, create a "bpf_create_map_params" struct to propagate map_extra to the kernel at map creation time, change map_extra to __u64 * in 4/5: check pthread condition variable in a loop when generating initial map data, remove "err" checks where not pragmatic, generate random values for the hashmap in the setup() instead of in the bpf program, add check_args() for checking that there aren't more requested entries than possible unique entries for the specified value size * in 5/5: Update commit message with updated benchmark data v3 -> v4: * Generalize the bloom filter map to be a bitset map with bloom filter capabilities * Add map_extra flags; pass in nr_hash_funcs through lower 4 bits of map_extra for the bitset map * Add tests for the bitset map (non-bloom filter) functionality * In the benchmarks, stats are computed only as monotonic increases, and place stats in a struct instead of as a percpu_array bpf map v2 -> v3: * Add libbpf changes for supporting nr_hash_funcs, instead of passing the number of hash functions through map_flags. * Separate the hashing logic in kernel/bpf/bloom_filter.c into a helper function v1 -> v2: * Remove libbpf changes, and pass the number of hash functions through map_flags instead. * Default to using 5 hash functions if no number of hash functions is specified. * Use set_bit instead of spinlocks in the bloom filter bitmap. This improved the speed significantly. For example, using 5 hash functions with 100k entries, there was roughly a 35% speed increase. * Use jhash2 (instead of jhash) for u32-aligned value sizes. This increased the speed by roughly 5 to 15%. When using jhash2 on value sizes non-u32 aligned (truncating any remainder bits), there was not a noticeable difference. * Add test for using the bloom filter as an inner map. * Reran the benchmarks, updated the commit messages to correspond to the new results. ==================== Acked-by: Martin KaFai Lau <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2 parents b066abb + f44bc54 commit 2895f48

25 files changed

+1429
-51
lines changed

include/linux/bpf.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -169,6 +169,7 @@ struct bpf_map {
169169
u32 value_size;
170170
u32 max_entries;
171171
u32 map_flags;
172+
u64 map_extra; /* any per-map-type extra fields */
172173
int spin_lock_off; /* >=0 valid offset, <0 error */
173174
int timer_off; /* >=0 valid offset, <0 error */
174175
u32 id;

include/linux/bpf_types.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -125,6 +125,7 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_STACK, stack_map_ops)
125125
BPF_MAP_TYPE(BPF_MAP_TYPE_STRUCT_OPS, bpf_struct_ops_map_ops)
126126
#endif
127127
BPF_MAP_TYPE(BPF_MAP_TYPE_RINGBUF, ringbuf_map_ops)
128+
BPF_MAP_TYPE(BPF_MAP_TYPE_BLOOM_FILTER, bloom_filter_map_ops)
128129

129130
BPF_LINK_TYPE(BPF_LINK_TYPE_RAW_TRACEPOINT, raw_tracepoint)
130131
BPF_LINK_TYPE(BPF_LINK_TYPE_TRACING, tracing)

include/uapi/linux/bpf.h

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -906,6 +906,7 @@ enum bpf_map_type {
906906
BPF_MAP_TYPE_RINGBUF,
907907
BPF_MAP_TYPE_INODE_STORAGE,
908908
BPF_MAP_TYPE_TASK_STORAGE,
909+
BPF_MAP_TYPE_BLOOM_FILTER,
909910
};
910911

911912
/* Note that tracing related programs such as
@@ -1274,6 +1275,13 @@ union bpf_attr {
12741275
* struct stored as the
12751276
* map value
12761277
*/
1278+
/* Any per-map-type extra fields
1279+
*
1280+
* BPF_MAP_TYPE_BLOOM_FILTER - the lowest 4 bits indicate the
1281+
* number of hash functions (if 0, the bloom filter will default
1282+
* to using 5 hash functions).
1283+
*/
1284+
__u64 map_extra;
12771285
};
12781286

12791287
struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
@@ -5638,6 +5646,7 @@ struct bpf_map_info {
56385646
__u32 btf_id;
56395647
__u32 btf_key_type_id;
56405648
__u32 btf_value_type_id;
5649+
__u64 map_extra;
56415650
} __attribute__((aligned(8)));
56425651

56435652
struct bpf_btf_info {

kernel/bpf/Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ endif
77
CFLAGS_core.o += $(call cc-disable-warning, override-init) $(cflags-nogcse-yy)
88

99
obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o bpf_iter.o map_iter.o task_iter.o prog_iter.o
10-
obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o
10+
obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o bloom_filter.o
1111
obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o
1212
obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o bpf_task_storage.o
1313
obj-${CONFIG_BPF_LSM} += bpf_inode_storage.o

kernel/bpf/bloom_filter.c

Lines changed: 195 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,195 @@
1+
// SPDX-License-Identifier: GPL-2.0
2+
/* Copyright (c) 2021 Facebook */
3+
4+
#include <linux/bitmap.h>
5+
#include <linux/bpf.h>
6+
#include <linux/btf.h>
7+
#include <linux/err.h>
8+
#include <linux/jhash.h>
9+
#include <linux/random.h>
10+
11+
#define BLOOM_CREATE_FLAG_MASK \
12+
(BPF_F_NUMA_NODE | BPF_F_ZERO_SEED | BPF_F_ACCESS_MASK)
13+
14+
struct bpf_bloom_filter {
15+
struct bpf_map map;
16+
u32 bitset_mask;
17+
u32 hash_seed;
18+
/* If the size of the values in the bloom filter is u32 aligned,
19+
* then it is more performant to use jhash2 as the underlying hash
20+
* function, else we use jhash. This tracks the number of u32s
21+
* in an u32-aligned value size. If the value size is not u32 aligned,
22+
* this will be 0.
23+
*/
24+
u32 aligned_u32_count;
25+
u32 nr_hash_funcs;
26+
unsigned long bitset[];
27+
};
28+
29+
static u32 hash(struct bpf_bloom_filter *bloom, void *value,
30+
u32 value_size, u32 index)
31+
{
32+
u32 h;
33+
34+
if (bloom->aligned_u32_count)
35+
h = jhash2(value, bloom->aligned_u32_count,
36+
bloom->hash_seed + index);
37+
else
38+
h = jhash(value, value_size, bloom->hash_seed + index);
39+
40+
return h & bloom->bitset_mask;
41+
}
42+
43+
static int peek_elem(struct bpf_map *map, void *value)
44+
{
45+
struct bpf_bloom_filter *bloom =
46+
container_of(map, struct bpf_bloom_filter, map);
47+
u32 i, h;
48+
49+
for (i = 0; i < bloom->nr_hash_funcs; i++) {
50+
h = hash(bloom, value, map->value_size, i);
51+
if (!test_bit(h, bloom->bitset))
52+
return -ENOENT;
53+
}
54+
55+
return 0;
56+
}
57+
58+
static int push_elem(struct bpf_map *map, void *value, u64 flags)
59+
{
60+
struct bpf_bloom_filter *bloom =
61+
container_of(map, struct bpf_bloom_filter, map);
62+
u32 i, h;
63+
64+
if (flags != BPF_ANY)
65+
return -EINVAL;
66+
67+
for (i = 0; i < bloom->nr_hash_funcs; i++) {
68+
h = hash(bloom, value, map->value_size, i);
69+
set_bit(h, bloom->bitset);
70+
}
71+
72+
return 0;
73+
}
74+
75+
static int pop_elem(struct bpf_map *map, void *value)
76+
{
77+
return -EOPNOTSUPP;
78+
}
79+
80+
static struct bpf_map *map_alloc(union bpf_attr *attr)
81+
{
82+
u32 bitset_bytes, bitset_mask, nr_hash_funcs, nr_bits;
83+
int numa_node = bpf_map_attr_numa_node(attr);
84+
struct bpf_bloom_filter *bloom;
85+
86+
if (!bpf_capable())
87+
return ERR_PTR(-EPERM);
88+
89+
if (attr->key_size != 0 || attr->value_size == 0 ||
90+
attr->max_entries == 0 ||
91+
attr->map_flags & ~BLOOM_CREATE_FLAG_MASK ||
92+
!bpf_map_flags_access_ok(attr->map_flags) ||
93+
(attr->map_extra & ~0xF))
94+
return ERR_PTR(-EINVAL);
95+
96+
/* The lower 4 bits of map_extra specify the number of hash functions */
97+
nr_hash_funcs = attr->map_extra & 0xF;
98+
if (nr_hash_funcs == 0)
99+
/* Default to using 5 hash functions if unspecified */
100+
nr_hash_funcs = 5;
101+
102+
/* For the bloom filter, the optimal bit array size that minimizes the
103+
* false positive probability is n * k / ln(2) where n is the number of
104+
* expected entries in the bloom filter and k is the number of hash
105+
* functions. We use 7 / 5 to approximate 1 / ln(2).
106+
*
107+
* We round this up to the nearest power of two to enable more efficient
108+
* hashing using bitmasks. The bitmask will be the bit array size - 1.
109+
*
110+
* If this overflows a u32, the bit array size will have 2^32 (4
111+
* GB) bits.
112+
*/
113+
if (check_mul_overflow(attr->max_entries, nr_hash_funcs, &nr_bits) ||
114+
check_mul_overflow(nr_bits / 5, (u32)7, &nr_bits) ||
115+
nr_bits > (1UL << 31)) {
116+
/* The bit array size is 2^32 bits but to avoid overflowing the
117+
* u32, we use U32_MAX, which will round up to the equivalent
118+
* number of bytes
119+
*/
120+
bitset_bytes = BITS_TO_BYTES(U32_MAX);
121+
bitset_mask = U32_MAX;
122+
} else {
123+
if (nr_bits <= BITS_PER_LONG)
124+
nr_bits = BITS_PER_LONG;
125+
else
126+
nr_bits = roundup_pow_of_two(nr_bits);
127+
bitset_bytes = BITS_TO_BYTES(nr_bits);
128+
bitset_mask = nr_bits - 1;
129+
}
130+
131+
bitset_bytes = roundup(bitset_bytes, sizeof(unsigned long));
132+
bloom = bpf_map_area_alloc(sizeof(*bloom) + bitset_bytes, numa_node);
133+
134+
if (!bloom)
135+
return ERR_PTR(-ENOMEM);
136+
137+
bpf_map_init_from_attr(&bloom->map, attr);
138+
139+
bloom->nr_hash_funcs = nr_hash_funcs;
140+
bloom->bitset_mask = bitset_mask;
141+
142+
/* Check whether the value size is u32-aligned */
143+
if ((attr->value_size & (sizeof(u32) - 1)) == 0)
144+
bloom->aligned_u32_count =
145+
attr->value_size / sizeof(u32);
146+
147+
if (!(attr->map_flags & BPF_F_ZERO_SEED))
148+
bloom->hash_seed = get_random_int();
149+
150+
return &bloom->map;
151+
}
152+
153+
static void map_free(struct bpf_map *map)
154+
{
155+
struct bpf_bloom_filter *bloom =
156+
container_of(map, struct bpf_bloom_filter, map);
157+
158+
bpf_map_area_free(bloom);
159+
}
160+
161+
static void *lookup_elem(struct bpf_map *map, void *key)
162+
{
163+
/* The eBPF program should use map_peek_elem instead */
164+
return ERR_PTR(-EINVAL);
165+
}
166+
167+
static int update_elem(struct bpf_map *map, void *key,
168+
void *value, u64 flags)
169+
{
170+
/* The eBPF program should use map_push_elem instead */
171+
return -EINVAL;
172+
}
173+
174+
static int check_btf(const struct bpf_map *map, const struct btf *btf,
175+
const struct btf_type *key_type,
176+
const struct btf_type *value_type)
177+
{
178+
/* Bloom filter maps are keyless */
179+
return btf_type_is_void(key_type) ? 0 : -EINVAL;
180+
}
181+
182+
static int bpf_bloom_btf_id;
183+
const struct bpf_map_ops bloom_filter_map_ops = {
184+
.map_meta_equal = bpf_map_meta_equal,
185+
.map_alloc = map_alloc,
186+
.map_free = map_free,
187+
.map_push_elem = push_elem,
188+
.map_peek_elem = peek_elem,
189+
.map_pop_elem = pop_elem,
190+
.map_lookup_elem = lookup_elem,
191+
.map_update_elem = update_elem,
192+
.map_check_btf = check_btf,
193+
.map_btf_name = "bpf_bloom_filter",
194+
.map_btf_id = &bpf_bloom_btf_id,
195+
};

kernel/bpf/syscall.c

Lines changed: 21 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -199,7 +199,8 @@ static int bpf_map_update_value(struct bpf_map *map, struct fd f, void *key,
199199
err = bpf_fd_reuseport_array_update_elem(map, key, value,
200200
flags);
201201
} else if (map->map_type == BPF_MAP_TYPE_QUEUE ||
202-
map->map_type == BPF_MAP_TYPE_STACK) {
202+
map->map_type == BPF_MAP_TYPE_STACK ||
203+
map->map_type == BPF_MAP_TYPE_BLOOM_FILTER) {
203204
err = map->ops->map_push_elem(map, value, flags);
204205
} else {
205206
rcu_read_lock();
@@ -238,7 +239,8 @@ static int bpf_map_copy_value(struct bpf_map *map, void *key, void *value,
238239
} else if (map->map_type == BPF_MAP_TYPE_REUSEPORT_SOCKARRAY) {
239240
err = bpf_fd_reuseport_array_lookup_elem(map, key, value);
240241
} else if (map->map_type == BPF_MAP_TYPE_QUEUE ||
241-
map->map_type == BPF_MAP_TYPE_STACK) {
242+
map->map_type == BPF_MAP_TYPE_STACK ||
243+
map->map_type == BPF_MAP_TYPE_BLOOM_FILTER) {
242244
err = map->ops->map_peek_elem(map, value);
243245
} else if (map->map_type == BPF_MAP_TYPE_STRUCT_OPS) {
244246
/* struct_ops map requires directly updating "value" */
@@ -348,6 +350,7 @@ void bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr)
348350
map->max_entries = attr->max_entries;
349351
map->map_flags = bpf_map_flags_retain_permanent(attr->map_flags);
350352
map->numa_node = bpf_map_attr_numa_node(attr);
353+
map->map_extra = attr->map_extra;
351354
}
352355

353356
static int bpf_map_alloc_id(struct bpf_map *map)
@@ -553,6 +556,7 @@ static void bpf_map_show_fdinfo(struct seq_file *m, struct file *filp)
553556
"value_size:\t%u\n"
554557
"max_entries:\t%u\n"
555558
"map_flags:\t%#x\n"
559+
"map_extra:\t%#llx\n"
556560
"memlock:\t%lu\n"
557561
"map_id:\t%u\n"
558562
"frozen:\t%u\n",
@@ -561,6 +565,7 @@ static void bpf_map_show_fdinfo(struct seq_file *m, struct file *filp)
561565
map->value_size,
562566
map->max_entries,
563567
map->map_flags,
568+
(unsigned long long)map->map_extra,
564569
bpf_map_memory_footprint(map),
565570
map->id,
566571
READ_ONCE(map->frozen));
@@ -810,7 +815,7 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
810815
return ret;
811816
}
812817

813-
#define BPF_MAP_CREATE_LAST_FIELD btf_vmlinux_value_type_id
818+
#define BPF_MAP_CREATE_LAST_FIELD map_extra
814819
/* called via syscall */
815820
static int map_create(union bpf_attr *attr)
816821
{
@@ -831,6 +836,10 @@ static int map_create(union bpf_attr *attr)
831836
return -EINVAL;
832837
}
833838

839+
if (attr->map_type != BPF_MAP_TYPE_BLOOM_FILTER &&
840+
attr->map_extra != 0)
841+
return -EINVAL;
842+
834843
f_flags = bpf_get_file_flag(attr->map_flags);
835844
if (f_flags < 0)
836845
return f_flags;
@@ -1080,6 +1089,14 @@ static int map_lookup_elem(union bpf_attr *attr)
10801089
if (!value)
10811090
goto free_key;
10821091

1092+
if (map->map_type == BPF_MAP_TYPE_BLOOM_FILTER) {
1093+
if (copy_from_user(value, uvalue, value_size))
1094+
err = -EFAULT;
1095+
else
1096+
err = bpf_map_copy_value(map, key, value, attr->flags);
1097+
goto free_value;
1098+
}
1099+
10831100
err = bpf_map_copy_value(map, key, value, attr->flags);
10841101
if (err)
10851102
goto free_value;
@@ -3881,6 +3898,7 @@ static int bpf_map_get_info_by_fd(struct file *file,
38813898
info.value_size = map->value_size;
38823899
info.max_entries = map->max_entries;
38833900
info.map_flags = map->map_flags;
3901+
info.map_extra = map->map_extra;
38843902
memcpy(info.name, map->name, sizeof(map->name));
38853903

38863904
if (map->btf) {

kernel/bpf/verifier.c

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5002,7 +5002,10 @@ static int resolve_map_arg_type(struct bpf_verifier_env *env,
50025002
return -EINVAL;
50035003
}
50045004
break;
5005-
5005+
case BPF_MAP_TYPE_BLOOM_FILTER:
5006+
if (meta->func_id == BPF_FUNC_map_peek_elem)
5007+
*arg_type = ARG_PTR_TO_MAP_VALUE;
5008+
break;
50065009
default:
50075010
break;
50085011
}
@@ -5577,6 +5580,11 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
55775580
func_id != BPF_FUNC_task_storage_delete)
55785581
goto error;
55795582
break;
5583+
case BPF_MAP_TYPE_BLOOM_FILTER:
5584+
if (func_id != BPF_FUNC_map_peek_elem &&
5585+
func_id != BPF_FUNC_map_push_elem)
5586+
goto error;
5587+
break;
55805588
default:
55815589
break;
55825590
}
@@ -5644,13 +5652,18 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
56445652
map->map_type != BPF_MAP_TYPE_SOCKHASH)
56455653
goto error;
56465654
break;
5647-
case BPF_FUNC_map_peek_elem:
56485655
case BPF_FUNC_map_pop_elem:
5649-
case BPF_FUNC_map_push_elem:
56505656
if (map->map_type != BPF_MAP_TYPE_QUEUE &&
56515657
map->map_type != BPF_MAP_TYPE_STACK)
56525658
goto error;
56535659
break;
5660+
case BPF_FUNC_map_peek_elem:
5661+
case BPF_FUNC_map_push_elem:
5662+
if (map->map_type != BPF_MAP_TYPE_QUEUE &&
5663+
map->map_type != BPF_MAP_TYPE_STACK &&
5664+
map->map_type != BPF_MAP_TYPE_BLOOM_FILTER)
5665+
goto error;
5666+
break;
56545667
case BPF_FUNC_sk_storage_get:
56555668
case BPF_FUNC_sk_storage_delete:
56565669
if (map->map_type != BPF_MAP_TYPE_SK_STORAGE)

0 commit comments

Comments
 (0)