Skip to content

Commit 98671a0

Browse files
anakryikoAlexei Starovoitov
authored andcommitted
bpf: unify VM_WRITE vs VM_MAYWRITE use in BPF map mmaping logic
For all BPF maps we ensure that VM_MAYWRITE is cleared when memory-mapping BPF map contents as initially read-only VMA. This is because in some cases BPF verifier relies on the underlying data to not be modified afterwards by user space, so once something is mapped read-only, it shouldn't be re-mmap'ed as read-write. As such, it's not necessary to check VM_MAYWRITE in bpf_map_mmap() and map->ops->map_mmap() callbacks: VM_WRITE should be consistently set for read-write mappings, and if VM_WRITE is not set, there is no way for user space to upgrade read-only mapping to read-write one. This patch cleans up this VM_WRITE vs VM_MAYWRITE handling within bpf_map_mmap(), which is an entry point for any BPF map mmap()-ing logic. We also drop unnecessary sanitization of VM_MAYWRITE in BPF ringbuf's map_mmap() callback implementation, as it is already performed by common code in bpf_map_mmap(). Note, though, that in bpf_map_mmap_{open,close}() callbacks we can't drop VM_MAYWRITE use, because it's possible (and is outside of subsystem's control) to have initially read-write memory mapping, which is subsequently dropped to read-only by user space through mprotect(). In such case, from BPF verifier POV it's read-write data throughout the lifetime of BPF map, and is counted as "active writer". But its VMAs will start out as VM_WRITE|VM_MAYWRITE, then mprotect() can change it to just VM_MAYWRITE (and no VM_WRITE), so when its finally munmap()'ed and bpf_map_mmap_close() is called, vm_flags will be just VM_MAYWRITE, but we still need to decrement active writer count with bpf_map_write_active_dec() as it's still considered to be a read-write mapping by the rest of BPF subsystem. Similar reasoning applies to bpf_map_mmap_open(), which is called whenever mmap(), munmap(), and/or mprotect() forces mm subsystem to split original VMA into multiple discontiguous VMAs. Memory-mapping handling is a bit tricky, yes. Cc: Jann Horn <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Shakeel Butt <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
1 parent c7f2188 commit 98671a0

File tree

2 files changed

+8
-6
lines changed

2 files changed

+8
-6
lines changed

kernel/bpf/ringbuf.c

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -268,8 +268,6 @@ static int ringbuf_map_mmap_kern(struct bpf_map *map, struct vm_area_struct *vma
268268
/* allow writable mapping for the consumer_pos only */
269269
if (vma->vm_pgoff != 0 || vma->vm_end - vma->vm_start != PAGE_SIZE)
270270
return -EPERM;
271-
} else {
272-
vm_flags_clear(vma, VM_MAYWRITE);
273271
}
274272
/* remap_vmalloc_range() checks size and offset constraints */
275273
return remap_vmalloc_range(vma, rb_map->rb,
@@ -289,8 +287,6 @@ static int ringbuf_map_mmap_user(struct bpf_map *map, struct vm_area_struct *vma
289287
* position, and the ring buffer data itself.
290288
*/
291289
return -EPERM;
292-
} else {
293-
vm_flags_clear(vma, VM_MAYWRITE);
294290
}
295291
/* remap_vmalloc_range() checks size and offset constraints */
296292
return remap_vmalloc_range(vma, rb_map->rb, vma->vm_pgoff + RINGBUF_PGOFF);

kernel/bpf/syscall.c

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1065,15 +1065,21 @@ static int bpf_map_mmap(struct file *filp, struct vm_area_struct *vma)
10651065
vma->vm_ops = &bpf_map_default_vmops;
10661066
vma->vm_private_data = map;
10671067
vm_flags_clear(vma, VM_MAYEXEC);
1068+
/* If mapping is read-only, then disallow potentially re-mapping with
1069+
* PROT_WRITE by dropping VM_MAYWRITE flag. This VM_MAYWRITE clearing
1070+
* means that as far as BPF map's memory-mapped VMAs are concerned,
1071+
* VM_WRITE and VM_MAYWRITE and equivalent, if one of them is set,
1072+
* both should be set, so we can forget about VM_MAYWRITE and always
1073+
* check just VM_WRITE
1074+
*/
10681075
if (!(vma->vm_flags & VM_WRITE))
1069-
/* disallow re-mapping with PROT_WRITE */
10701076
vm_flags_clear(vma, VM_MAYWRITE);
10711077

10721078
err = map->ops->map_mmap(map, vma);
10731079
if (err)
10741080
goto out;
10751081

1076-
if (vma->vm_flags & VM_MAYWRITE)
1082+
if (vma->vm_flags & VM_WRITE)
10771083
bpf_map_write_active_inc(map);
10781084
out:
10791085
mutex_unlock(&map->freeze_mutex);

0 commit comments

Comments
 (0)