Skip to content

Conversation

bmastbergen
Copy link
Collaborator

jira VULN-6599
cve GHSA-gxg7-pxwf-9r28

The fix series for this CVE is here:
https://lore.kernel.org/all/[email protected]/

However due to the great deal of change in the bpf subsystem between this kernel and 6.3 (where the series landed), backporting this series would require a huge amount of commits to be backported just to apply.

If you look at the changelog for 5.14.0-284.48.1.el9_2 for changes to address GHSA-gxg7-pxwf-9r28 you will only see one entry:

- bpf: Fix partial dynptr stack slot reads/writes (Artem Savkov) [2227282 2227283] {CVE-2023-39191}

So it would appear RH found the prospect of backporting enough code to land the whole series a bit daunting as well. As such we have decided to backport this single commit as they did.

bpf: Fix partial dynptr stack slot reads/writes

jira VULN-6600
cve CVE-2023-39191
commit-author Kumar Kartikeya Dwivedi <[email protected]> 
commit ef8fc7a07c0e161841779d6fe3f6acd5a05c547c
upstream-diff The prototype for __mark_reg_not_init had to be moved
              before the new destroy_if_dynptr_stack_slot function.
              In newer kernels this prototype has already been
              moved earlier in the file.  s/__get_spi/get_spi/g as
              the __get_spi function hasn't been split out yet in
              this kernel version.  __get_spi in future kernels and
              get_spi in this kernel are identical.  The upstream
              commit tweaks some selftest failure messages, but
              those messages don't exist in this kernel.

Currently, while reads are disallowed for dynptr stack slots, writes are not. Reads don't work from both direct access and helpers, while writes do work in both cases, but have the effect of overwriting the slot_type.

While this is fine, handling for a few edge cases is missing. Firstly, a user can overwrite the stack slots of dynptr partially.

Consider the following layout:
spi: [d][d][?]
      2  1  0

First slot is at spi 2, second at spi 1.
Now, do a write of 1 to 8 bytes for spi 1.

This will essentially either write STACK_MISC for all slot_types or STACK_MISC and STACK_ZERO (in case of size < BPF_REG_SIZE partial write of zeroes). The end result is that slot is scrubbed.

Now, the layout is:
spi: [d][m][?]
      2  1  0

Suppose if user initializes spi = 1 as dynptr.
We get:
spi: [d][d][d]
      2  1  0

But this time, both spi 2 and spi 1 have first_slot = true.

Now, when passing spi 2 to dynptr helper, it will consider it as initialized as it does not check whether second slot has first_slot == false. And spi 1 should already work as normal.

This effectively replaced size + offset of first dynptr, hence allowing invalid OOB reads and writes.

Make a few changes to protect against this:
When writing to PTR_TO_STACK using BPF insns, when we touch spi of a STACK_DYNPTR type, mark both first and second slot (regardless of which slot we touch) as STACK_INVALID. Reads are already prevented.

Second, prevent writing	to stack memory from helpers if the range may contain any STACK_DYNPTR slots. Reads are already prevented.

For helpers, we cannot allow it to destroy dynptrs from the writes as depending on arguments, helper may take uninit_mem and dynptr both at the same time. This would mean that helper may write to uninit_mem before it reads the dynptr, which would be bad.

PTR_TO_MEM: [?????dd]

Depending on the code inside the helper, it may end up overwriting the dynptr contents first and then read those as the dynptr argument.

Verifier would only simulate destruction when it does byte by byte access simulation in check_helper_call for meta.access_size, and fail to catch this case, as it happens after argument checks.

The same would need to be done for any other non-trivial objects created on the stack in the future, such as bpf_list_head on stack, or bpf_rb_root on stack.

A common misunderstanding in the current code is that MEM_UNINIT means writes, but note that writes may also be performed even without MEM_UNINIT in case of helpers, in that case the code after handling meta && meta->raw_mode will complain when it sees STACK_DYNPTR. So that invalid read case also covers writes to potential STACK_DYNPTR slots. The only loophole was in case of meta->raw_mode which simulated writes through instructions which could overwrite them.

A future series sequenced after this will focus on the clean up of helper access checks and bugs around that.

Fixes: 97e03f521050 ("bpf: Add verifier support for dynptrs")
	Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
	Signed-off-by: Alexei Starovoitov <[email protected]>
(cherry picked from commit ef8fc7a07c0e161841779d6fe3f6acd5a05c547c)
	Signed-off-by: Brett Mastbergen <[email protected]>

Since this was a bpf verifier change, bpf selftest logs below:
bpf-selftest-before.log
bpf-selftest-after.log

brett@lycia ~/ciq/vuln-6599 % grep ^ok bpf-selftest-before.log| wc -l
32
brett@lycia ~/ciq/vuln-6599 % grep ^ok bpf-selftest-after.log| wc -l
32
brett@lycia ~/ciq/vuln-6599 %

Full selftests were also run:
selftests-before.log
selftests-after.log

brett@lycia ~/ciq/vuln-6599 % grep ^ok selftests-before.log| wc -l
308
brett@lycia ~/ciq/vuln-6599 % grep ^ok selftests-after.log| wc -l
307
brett@lycia ~/ciq/vuln-6599 %

Chalking up the one fewer pass to the usual flakiness

jira VULN-6599
cve CVE-2023-39191
commit-author Kumar Kartikeya Dwivedi <[email protected]>
commit ef8fc7a
upstream-diff The prototype for __mark_reg_not_init had to be moved
              before the new destroy_if_dynptr_stack_slot function.
              In newer kernels this prototype has already been
              moved earlier in the file.  s/__get_spi/get_spi/g as
              the __get_spi funtion hasn't been split out yet in
              this kernel version.  __get_spi in future kernels and
              get_spi in this kernel are identical.  The upstream
              commit tweaks some selftest failure messages, but
              those messages don't exist in this kernel.

Currently, while reads are disallowed for dynptr stack slots, writes are
not. Reads don't work from both direct access and helpers, while writes
do work in both cases, but have the effect of overwriting the slot_type.

While this is fine, handling for a few edge cases is missing. Firstly,
a user can overwrite the stack slots of dynptr partially.

Consider the following layout:
spi: [d][d][?]
      2  1  0

First slot is at spi 2, second at spi 1.
Now, do a write of 1 to 8 bytes for spi 1.

This will essentially either write STACK_MISC for all slot_types or
STACK_MISC and STACK_ZERO (in case of size < BPF_REG_SIZE partial write
of zeroes). The end result is that slot is scrubbed.

Now, the layout is:
spi: [d][m][?]
      2  1  0

Suppose if user initializes spi = 1 as dynptr.
We get:
spi: [d][d][d]
      2  1  0

But this time, both spi 2 and spi 1 have first_slot = true.

Now, when passing spi 2 to dynptr helper, it will consider it as
initialized as it does not check whether second slot has first_slot ==
false. And spi 1 should already work as normal.

This effectively replaced size + offset of first dynptr, hence allowing
invalid OOB reads and writes.

Make a few changes to protect against this:
When writing to PTR_TO_STACK using BPF insns, when we touch spi of a
STACK_DYNPTR type, mark both first and second slot (regardless of which
slot we touch) as STACK_INVALID. Reads are already prevented.

Second, prevent writing	to stack memory from helpers if the range may
contain any STACK_DYNPTR slots. Reads are already prevented.

For helpers, we cannot allow it to destroy dynptrs from the writes as
depending on arguments, helper may take uninit_mem and dynptr both at
the same time. This would mean that helper may write to uninit_mem
before it reads the dynptr, which would be bad.

PTR_TO_MEM: [?????dd]

Depending on the code inside the helper, it may end up overwriting the
dynptr contents first and then read those as the dynptr argument.

Verifier would only simulate destruction when it does byte by byte
access simulation in check_helper_call for meta.access_size, and
fail to catch this case, as it happens after argument checks.

The same would need to be done for any other non-trivial objects created
on the stack in the future, such as bpf_list_head on stack, or
bpf_rb_root on stack.

A common misunderstanding in the current code is that MEM_UNINIT means
writes, but note that writes may also be performed even without
MEM_UNINIT in case of helpers, in that case the code after handling meta
&& meta->raw_mode will complain when it sees STACK_DYNPTR. So that
invalid read case also covers writes to potential STACK_DYNPTR slots.
The only loophole was in case of meta->raw_mode which simulated writes
through instructions which could overwrite them.

A future series sequenced after this will focus on the clean up of
helper access checks and bugs around that.

Fixes: 97e03f5 ("bpf: Add verifier support for dynptrs")
	Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
	Signed-off-by: Alexei Starovoitov <[email protected]>
(cherry picked from commit ef8fc7a)
	Signed-off-by: Brett Mastbergen <[email protected]>
Copy link
Collaborator

@PlaidCat PlaidCat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:
Same as other 9.2 based changes
#38
#40

Copy link

@gvrose8192 gvrose8192 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - as pointed out by Maple we're already carrying this elsewhere. Thanks!

@bmastbergen bmastbergen merged commit da2d199 into ciqlts9_2-rt Jan 8, 2025
4 checks passed
@bmastbergen bmastbergen deleted the bmastbergen_ciqlts9_2-rt/VULN-6599 branch January 8, 2025 02:58
pvts-mat pushed a commit to pvts-mat/kernel-src-tree that referenced this pull request Jan 14, 2025
jira LE-1907
Rebuild_History Non-Buildable kernel-rt-5.14.0-284.30.1.rt14.315.el9_2
commit-author Xiaolong Huang <[email protected]>
commit ff8376a

Some function calls are not implemented in rxrpc_no_security, there are
preparse_server_key, free_preparse_server_key and destroy_server_key.
When rxrpc security type is rxrpc_no_security, user can easily trigger a
null-ptr-deref bug via ioctl. So judgment should be added to prevent it

The crash log:
user@syzkaller:~$ ./rxrpc_preparse_s
[   37.956878][T15626] BUG: kernel NULL pointer dereference, address: 0000000000000000
[   37.957645][T15626] #PF: supervisor instruction fetch in kernel mode
[   37.958229][T15626] #PF: error_code(0x0010) - not-present page
[   37.958762][T15626] PGD 4aadf067 P4D 4aadf067 PUD 4aade067 PMD 0
[   37.959321][T15626] Oops: 0010 [ctrliq#1] PREEMPT SMP
[   37.959739][T15626] CPU: 0 PID: 15626 Comm: rxrpc_preparse_ Not tainted 5.17.0-01442-gb47d5a4f6b8d ctrliq#43
[   37.960588][T15626] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014
[   37.961474][T15626] RIP: 0010:0x0
[   37.961787][T15626] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
[   37.962480][T15626] RSP: 0018:ffffc9000d9abdc0 EFLAGS: 00010286
[   37.963018][T15626] RAX: ffffffff84335200 RBX: ffff888012a1ce80 RCX: 0000000000000000
[   37.963727][T15626] RDX: 0000000000000000 RSI: ffffffff84a736dc RDI: ffffc9000d9abe48
[   37.964425][T15626] RBP: ffffc9000d9abe48 R08: 0000000000000000 R09: 0000000000000002
[   37.965118][T15626] R10: 000000000000000a R11: f000000000000000 R12: ffff888013145680
[   37.965836][T15626] R13: 0000000000000000 R14: ffffffffffffffec R15: ffff8880432aba80
[   37.966441][T15626] FS:  00007f2177907700(0000) GS:ffff88803ec00000(0000) knlGS:0000000000000000
[   37.966979][T15626] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   37.967384][T15626] CR2: ffffffffffffffd6 CR3: 000000004aaf1000 CR4: 00000000000006f0
[   37.967864][T15626] Call Trace:
[   37.968062][T15626]  <TASK>
[   37.968240][T15626]  rxrpc_preparse_s+0x59/0x90
[   37.968541][T15626]  key_create_or_update+0x174/0x510
[   37.968863][T15626]  __x64_sys_add_key+0x139/0x1d0
[   37.969165][T15626]  do_syscall_64+0x35/0xb0
[   37.969451][T15626]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   37.969824][T15626] RIP: 0033:0x43a1f9

	Signed-off-by: Xiaolong Huang <[email protected]>
	Tested-by: Xiaolong Huang <[email protected]>
	Signed-off-by: David Howells <[email protected]>
	Acked-by: Marc Dionne <[email protected]>
cc: [email protected]
Link: http://lists.infradead.org/pipermail/linux-afs/2022-March/005069.html
Fixes: 12da59f ("rxrpc: Hand server key parsing off to the security class")
Link: https://lore.kernel.org/r/164865013439.2941502.8966285221215590921.stgit@warthog.procyon.org.uk
	Signed-off-by: Paolo Abeni <[email protected]>
(cherry picked from commit ff8376a)
	Signed-off-by: Jonathan Maple <[email protected]>
github-actions bot pushed a commit that referenced this pull request Apr 10, 2025
[ Upstream commit f0c2427 ]

Change all variables storing mlx5_umem_mkc_find_best_pgsz() result to
unsigned long to support values larger than 31 and avoid overflow.

For example: If we try to register 4GB of memory that is contiguous in
physical memory, the driver will optimize the page_size and try to use
an mkey with 4GB entity size. The 'unsigned int' page_size variable will
overflow to '0' and we'll hit the WARN_ON() in alloc_cacheable_mr().

WARNING: CPU: 2 PID: 1203 at drivers/infiniband/hw/mlx5/mr.c:1124 alloc_cacheable_mr+0x22/0x580 [mlx5_ib]
Modules linked in: mlx5_ib mlx5_core bonding ip6_gre ip6_tunnel tunnel6 ip_gre gre rdma_rxe rdma_ucm ib_uverbs ib_ipoib ib_umad rpcrdma ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm fuse ib_core [last unloaded: mlx5_core]
CPU: 2 UID: 70878 PID: 1203 Comm: rdma_resource_l Tainted: G        W          6.14.0-rc4-dirty #43
Tainted: [W]=WARN
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:alloc_cacheable_mr+0x22/0x580 [mlx5_ib]
Code: 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 41 52 53 48 83 ec 30 f6 46 28 04 4c 8b 77 08 75 21 <0f> 0b 49 c7 c2 ea ff ff ff 48 8d 65 d0 4c 89 d0 5b 41 5a 41 5c 41
RSP: 0018:ffffc900006ffac8 EFLAGS: 00010246
RAX: 0000000004c0d0d0 RBX: ffff888217a22000 RCX: 0000000000100001
RDX: 00007fb7ac480000 RSI: ffff8882037b1240 RDI: ffff8882046f0600
RBP: ffffc900006ffb28 R08: 0000000000000001 R09: 0000000000000000
R10: 00000000000007e0 R11: ffffea0008011d40 R12: ffff8882037b1240
R13: ffff8882046f0600 R14: ffff888217a22000 R15: ffffc900006ffe00
FS:  00007fb7ed013340(0000) GS:ffff88885fd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fb7ed1d8000 CR3: 00000001fd8f6006 CR4: 0000000000772eb0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 <TASK>
 ? __warn+0x81/0x130
 ? alloc_cacheable_mr+0x22/0x580 [mlx5_ib]
 ? report_bug+0xfc/0x1e0
 ? handle_bug+0x55/0x90
 ? exc_invalid_op+0x17/0x70
 ? asm_exc_invalid_op+0x1a/0x20
 ? alloc_cacheable_mr+0x22/0x580 [mlx5_ib]
 create_real_mr+0x54/0x150 [mlx5_ib]
 ib_uverbs_reg_mr+0x17f/0x2a0 [ib_uverbs]
 ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xca/0x140 [ib_uverbs]
 ib_uverbs_run_method+0x6d0/0x780 [ib_uverbs]
 ? __pfx_ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x10/0x10 [ib_uverbs]
 ib_uverbs_cmd_verbs+0x19b/0x360 [ib_uverbs]
 ? walk_system_ram_range+0x79/0xd0
 ? ___pte_offset_map+0x1b/0x110
 ? __pte_offset_map_lock+0x80/0x100
 ib_uverbs_ioctl+0xac/0x110 [ib_uverbs]
 __x64_sys_ioctl+0x94/0xb0
 do_syscall_64+0x50/0x110
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7fb7ecf0737b
Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 7d 2a 0f 00 f7 d8 64 89 01 48
RSP: 002b:00007ffdbe03ecc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007ffdbe03edb8 RCX: 00007fb7ecf0737b
RDX: 00007ffdbe03eda0 RSI: 00000000c0181b01 RDI: 0000000000000003
RBP: 00007ffdbe03ed80 R08: 00007fb7ecc84010 R09: 00007ffdbe03eed4
R10: 0000000000000009 R11: 0000000000000246 R12: 00007ffdbe03eed4
R13: 000000000000000c R14: 000000000000000c R15: 00007fb7ecc84150
 </TASK>

Fixes: cef7dde ("net/mlx5: Expand mkey page size to support 6 bits")
Signed-off-by: Michael Guralnik <[email protected]>
Reviewed-by: Yishai Hadas <[email protected]>
Link: https://patch.msgid.link/2479a4a3f6fd9bd032e1b6d396274a89c4c5e22f.1741875692.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
bmastbergen pushed a commit to bmastbergen/kernel-src-tree that referenced this pull request Apr 25, 2025
jira LE-1907
Rebuild_History Non-Buildable kernel-5.14.0-284.30.1.el9_2
commit-author Xiaolong Huang <[email protected]>
commit ff8376a

Some function calls are not implemented in rxrpc_no_security, there are
preparse_server_key, free_preparse_server_key and destroy_server_key.
When rxrpc security type is rxrpc_no_security, user can easily trigger a
null-ptr-deref bug via ioctl. So judgment should be added to prevent it

The crash log:
user@syzkaller:~$ ./rxrpc_preparse_s
[   37.956878][T15626] BUG: kernel NULL pointer dereference, address: 0000000000000000
[   37.957645][T15626] #PF: supervisor instruction fetch in kernel mode
[   37.958229][T15626] #PF: error_code(0x0010) - not-present page
[   37.958762][T15626] PGD 4aadf067 P4D 4aadf067 PUD 4aade067 PMD 0
[   37.959321][T15626] Oops: 0010 [#1] PREEMPT SMP
[   37.959739][T15626] CPU: 0 PID: 15626 Comm: rxrpc_preparse_ Not tainted 5.17.0-01442-gb47d5a4f6b8d ctrliq#43
[   37.960588][T15626] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014
[   37.961474][T15626] RIP: 0010:0x0
[   37.961787][T15626] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
[   37.962480][T15626] RSP: 0018:ffffc9000d9abdc0 EFLAGS: 00010286
[   37.963018][T15626] RAX: ffffffff84335200 RBX: ffff888012a1ce80 RCX: 0000000000000000
[   37.963727][T15626] RDX: 0000000000000000 RSI: ffffffff84a736dc RDI: ffffc9000d9abe48
[   37.964425][T15626] RBP: ffffc9000d9abe48 R08: 0000000000000000 R09: 0000000000000002
[   37.965118][T15626] R10: 000000000000000a R11: f000000000000000 R12: ffff888013145680
[   37.965836][T15626] R13: 0000000000000000 R14: ffffffffffffffec R15: ffff8880432aba80
[   37.966441][T15626] FS:  00007f2177907700(0000) GS:ffff88803ec00000(0000) knlGS:0000000000000000
[   37.966979][T15626] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   37.967384][T15626] CR2: ffffffffffffffd6 CR3: 000000004aaf1000 CR4: 00000000000006f0
[   37.967864][T15626] Call Trace:
[   37.968062][T15626]  <TASK>
[   37.968240][T15626]  rxrpc_preparse_s+0x59/0x90
[   37.968541][T15626]  key_create_or_update+0x174/0x510
[   37.968863][T15626]  __x64_sys_add_key+0x139/0x1d0
[   37.969165][T15626]  do_syscall_64+0x35/0xb0
[   37.969451][T15626]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   37.969824][T15626] RIP: 0033:0x43a1f9

	Signed-off-by: Xiaolong Huang <[email protected]>
	Tested-by: Xiaolong Huang <[email protected]>
	Signed-off-by: David Howells <[email protected]>
	Acked-by: Marc Dionne <[email protected]>
cc: [email protected]
Link: http://lists.infradead.org/pipermail/linux-afs/2022-March/005069.html
Fixes: 12da59f ("rxrpc: Hand server key parsing off to the security class")
Link: https://lore.kernel.org/r/164865013439.2941502.8966285221215590921.stgit@warthog.procyon.org.uk
	Signed-off-by: Paolo Abeni <[email protected]>
(cherry picked from commit ff8376a)
	Signed-off-by: Jonathan Maple <[email protected]>
PlaidCat added a commit that referenced this pull request Jul 29, 2025
jira LE-3666
cve CVE-2025-22091
Rebuild_History Non-Buildable kernel-5.14.0-570.30.1.el9_6
commit-author Michael Guralnik <[email protected]>
commit f0c2427

Change all variables storing mlx5_umem_mkc_find_best_pgsz() result to
unsigned long to support values larger than 31 and avoid overflow.

For example: If we try to register 4GB of memory that is contiguous in
physical memory, the driver will optimize the page_size and try to use
an mkey with 4GB entity size. The 'unsigned int' page_size variable will
overflow to '0' and we'll hit the WARN_ON() in alloc_cacheable_mr().

WARNING: CPU: 2 PID: 1203 at drivers/infiniband/hw/mlx5/mr.c:1124 alloc_cacheable_mr+0x22/0x580 [mlx5_ib]
Modules linked in: mlx5_ib mlx5_core bonding ip6_gre ip6_tunnel tunnel6 ip_gre gre rdma_rxe rdma_ucm ib_uverbs ib_ipoib ib_umad rpcrdma ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm fuse ib_core [last unloaded: mlx5_core]
CPU: 2 UID: 70878 PID: 1203 Comm: rdma_resource_l Tainted: G        W          6.14.0-rc4-dirty #43
Tainted: [W]=WARN
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:alloc_cacheable_mr+0x22/0x580 [mlx5_ib]
Code: 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 41 52 53 48 83 ec 30 f6 46 28 04 4c 8b 77 08 75 21 <0f> 0b 49 c7 c2 ea ff ff ff 48 8d 65 d0 4c 89 d0 5b 41 5a 41 5c 41
RSP: 0018:ffffc900006ffac8 EFLAGS: 00010246
RAX: 0000000004c0d0d0 RBX: ffff888217a22000 RCX: 0000000000100001
RDX: 00007fb7ac480000 RSI: ffff8882037b1240 RDI: ffff8882046f0600
RBP: ffffc900006ffb28 R08: 0000000000000001 R09: 0000000000000000
R10: 00000000000007e0 R11: ffffea0008011d40 R12: ffff8882037b1240
R13: ffff8882046f0600 R14: ffff888217a22000 R15: ffffc900006ffe00
FS:  00007fb7ed013340(0000) GS:ffff88885fd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fb7ed1d8000 CR3: 00000001fd8f6006 CR4: 0000000000772eb0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 <TASK>
 ? __warn+0x81/0x130
 ? alloc_cacheable_mr+0x22/0x580 [mlx5_ib]
 ? report_bug+0xfc/0x1e0
 ? handle_bug+0x55/0x90
 ? exc_invalid_op+0x17/0x70
 ? asm_exc_invalid_op+0x1a/0x20
 ? alloc_cacheable_mr+0x22/0x580 [mlx5_ib]
 create_real_mr+0x54/0x150 [mlx5_ib]
 ib_uverbs_reg_mr+0x17f/0x2a0 [ib_uverbs]
 ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xca/0x140 [ib_uverbs]
 ib_uverbs_run_method+0x6d0/0x780 [ib_uverbs]
 ? __pfx_ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x10/0x10 [ib_uverbs]
 ib_uverbs_cmd_verbs+0x19b/0x360 [ib_uverbs]
 ? walk_system_ram_range+0x79/0xd0
 ? ___pte_offset_map+0x1b/0x110
 ? __pte_offset_map_lock+0x80/0x100
 ib_uverbs_ioctl+0xac/0x110 [ib_uverbs]
 __x64_sys_ioctl+0x94/0xb0
 do_syscall_64+0x50/0x110
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7fb7ecf0737b
Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 7d 2a 0f 00 f7 d8 64 89 01 48
RSP: 002b:00007ffdbe03ecc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007ffdbe03edb8 RCX: 00007fb7ecf0737b
RDX: 00007ffdbe03eda0 RSI: 00000000c0181b01 RDI: 0000000000000003
RBP: 00007ffdbe03ed80 R08: 00007fb7ecc84010 R09: 00007ffdbe03eed4
R10: 0000000000000009 R11: 0000000000000246 R12: 00007ffdbe03eed4
R13: 000000000000000c R14: 000000000000000c R15: 00007fb7ecc84150
 </TASK>

Fixes: cef7dde ("net/mlx5: Expand mkey page size to support 6 bits")
	Signed-off-by: Michael Guralnik <[email protected]>
	Reviewed-by: Yishai Hadas <[email protected]>
Link: https://patch.msgid.link/2479a4a3f6fd9bd032e1b6d396274a89c4c5e22f.1741875692.git.leon@kernel.org
	Signed-off-by: Leon Romanovsky <[email protected]>
(cherry picked from commit f0c2427)
	Signed-off-by: Jonathan Maple <[email protected]>
github-actions bot pushed a commit that referenced this pull request Aug 19, 2025
JIRA: https://issues.redhat.com/browse/RHEL-72226
JIRA: https://issues.redhat.com/browse/RHEL-73517
JIRA: https://issues.redhat.com/browse/RHEL-83386
JIRA: https://issues.redhat.com/browse/RHEL-99321
CVE: CVE-2025-22091
Upstream-status: v6.15-rc1

commit f0c2427
Author: Michael Guralnik <[email protected]>
Date:   Thu Mar 13 16:29:51 2025 +0200

    RDMA/mlx5: Fix page_size variable overflow

    Change all variables storing mlx5_umem_mkc_find_best_pgsz() result to
    unsigned long to support values larger than 31 and avoid overflow.

    For example: If we try to register 4GB of memory that is contiguous in
    physical memory, the driver will optimize the page_size and try to use
    an mkey with 4GB entity size. The 'unsigned int' page_size variable will
    overflow to '0' and we'll hit the WARN_ON() in alloc_cacheable_mr().

    WARNING: CPU: 2 PID: 1203 at drivers/infiniband/hw/mlx5/mr.c:1124 alloc_cacheable_mr+0x22/0x580 [mlx5_ib]
    Modules linked in: mlx5_ib mlx5_core bonding ip6_gre ip6_tunnel tunnel6 ip_gre gre rdma_rxe rdma_ucm ib_uverbs ib_ipoib ib_umad rpcrdma ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm fuse ib_core [last unloaded: mlx5_core]
    CPU: 2 UID: 70878 PID: 1203 Comm: rdma_resource_l Tainted: G        W          6.14.0-rc4-dirty #43
    Tainted: [W]=WARN
    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
    RIP: 0010:alloc_cacheable_mr+0x22/0x580 [mlx5_ib]
    Code: 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 41 52 53 48 83 ec 30 f6 46 28 04 4c 8b 77 08 75 21 <0f> 0b 49 c7 c2 ea ff ff ff 48 8d 65 d0 4c 89 d0 5b 41 5a 41 5c 41
    RSP: 0018:ffffc900006ffac8 EFLAGS: 00010246
    RAX: 0000000004c0d0d0 RBX: ffff888217a22000 RCX: 0000000000100001
    RDX: 00007fb7ac480000 RSI: ffff8882037b1240 RDI: ffff8882046f0600
    RBP: ffffc900006ffb28 R08: 0000000000000001 R09: 0000000000000000
    R10: 00000000000007e0 R11: ffffea0008011d40 R12: ffff8882037b1240
    R13: ffff8882046f0600 R14: ffff888217a22000 R15: ffffc900006ffe00
    FS:  00007fb7ed013340(0000) GS:ffff88885fd00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007fb7ed1d8000 CR3: 00000001fd8f6006 CR4: 0000000000772eb0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    PKRU: 55555554
    Call Trace:
     <TASK>
     ? __warn+0x81/0x130
     ? alloc_cacheable_mr+0x22/0x580 [mlx5_ib]
     ? report_bug+0xfc/0x1e0
     ? handle_bug+0x55/0x90
     ? exc_invalid_op+0x17/0x70
     ? asm_exc_invalid_op+0x1a/0x20
     ? alloc_cacheable_mr+0x22/0x580 [mlx5_ib]
     create_real_mr+0x54/0x150 [mlx5_ib]
     ib_uverbs_reg_mr+0x17f/0x2a0 [ib_uverbs]
     ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xca/0x140 [ib_uverbs]
     ib_uverbs_run_method+0x6d0/0x780 [ib_uverbs]
     ? __pfx_ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x10/0x10 [ib_uverbs]
     ib_uverbs_cmd_verbs+0x19b/0x360 [ib_uverbs]
     ? walk_system_ram_range+0x79/0xd0
     ? ___pte_offset_map+0x1b/0x110
     ? __pte_offset_map_lock+0x80/0x100
     ib_uverbs_ioctl+0xac/0x110 [ib_uverbs]
     __x64_sys_ioctl+0x94/0xb0
     do_syscall_64+0x50/0x110
     entry_SYSCALL_64_after_hwframe+0x76/0x7e
    RIP: 0033:0x7fb7ecf0737b
    Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 7d 2a 0f 00 f7 d8 64 89 01 48
    RSP: 002b:00007ffdbe03ecc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
    RAX: ffffffffffffffda RBX: 00007ffdbe03edb8 RCX: 00007fb7ecf0737b
    RDX: 00007ffdbe03eda0 RSI: 00000000c0181b01 RDI: 0000000000000003
    RBP: 00007ffdbe03ed80 R08: 00007fb7ecc84010 R09: 00007ffdbe03eed4
    R10: 0000000000000009 R11: 0000000000000246 R12: 00007ffdbe03eed4
    R13: 000000000000000c R14: 000000000000000c R15: 00007fb7ecc84150
     </TASK>

    Fixes: cef7dde ("net/mlx5: Expand mkey page size to support 6 bits")
    Signed-off-by: Michael Guralnik <[email protected]>
    Reviewed-by: Yishai Hadas <[email protected]>
    Link: https://patch.msgid.link/2479a4a3f6fd9bd032e1b6d396274a89c4c5e22f.1741875692.git.leon@kernel.org
    Signed-off-by: Leon Romanovsky <[email protected]>

Signed-off-by: Benjamin Poirier <[email protected]>
github-actions bot pushed a commit that referenced this pull request Aug 19, 2025
JIRA: https://issues.redhat.com/browse/RHEL-72227
JIRA: https://issues.redhat.com/browse/RHEL-73520
JIRA: https://issues.redhat.com/browse/RHEL-99327
CVE: CVE-2025-22091
Upstream-status: v6.15-rc1

commit f0c2427
Author: Michael Guralnik <[email protected]>
Date:   Thu Mar 13 16:29:51 2025 +0200

    RDMA/mlx5: Fix page_size variable overflow

    Change all variables storing mlx5_umem_mkc_find_best_pgsz() result to
    unsigned long to support values larger than 31 and avoid overflow.

    For example: If we try to register 4GB of memory that is contiguous in
    physical memory, the driver will optimize the page_size and try to use
    an mkey with 4GB entity size. The 'unsigned int' page_size variable will
    overflow to '0' and we'll hit the WARN_ON() in alloc_cacheable_mr().

    WARNING: CPU: 2 PID: 1203 at drivers/infiniband/hw/mlx5/mr.c:1124 alloc_cacheable_mr+0x22/0x580 [mlx5_ib]
    Modules linked in: mlx5_ib mlx5_core bonding ip6_gre ip6_tunnel tunnel6 ip_gre gre rdma_rxe rdma_ucm ib_uverbs ib_ipoib ib_umad rpcrdma ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm fuse ib_core [last unloaded: mlx5_core]
    CPU: 2 UID: 70878 PID: 1203 Comm: rdma_resource_l Tainted: G        W          6.14.0-rc4-dirty #43
    Tainted: [W]=WARN
    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
    RIP: 0010:alloc_cacheable_mr+0x22/0x580 [mlx5_ib]
    Code: 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 41 52 53 48 83 ec 30 f6 46 28 04 4c 8b 77 08 75 21 <0f> 0b 49 c7 c2 ea ff ff ff 48 8d 65 d0 4c 89 d0 5b 41 5a 41 5c 41
    RSP: 0018:ffffc900006ffac8 EFLAGS: 00010246
    RAX: 0000000004c0d0d0 RBX: ffff888217a22000 RCX: 0000000000100001
    RDX: 00007fb7ac480000 RSI: ffff8882037b1240 RDI: ffff8882046f0600
    RBP: ffffc900006ffb28 R08: 0000000000000001 R09: 0000000000000000
    R10: 00000000000007e0 R11: ffffea0008011d40 R12: ffff8882037b1240
    R13: ffff8882046f0600 R14: ffff888217a22000 R15: ffffc900006ffe00
    FS:  00007fb7ed013340(0000) GS:ffff88885fd00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007fb7ed1d8000 CR3: 00000001fd8f6006 CR4: 0000000000772eb0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    PKRU: 55555554
    Call Trace:
     <TASK>
     ? __warn+0x81/0x130
     ? alloc_cacheable_mr+0x22/0x580 [mlx5_ib]
     ? report_bug+0xfc/0x1e0
     ? handle_bug+0x55/0x90
     ? exc_invalid_op+0x17/0x70
     ? asm_exc_invalid_op+0x1a/0x20
     ? alloc_cacheable_mr+0x22/0x580 [mlx5_ib]
     create_real_mr+0x54/0x150 [mlx5_ib]
     ib_uverbs_reg_mr+0x17f/0x2a0 [ib_uverbs]
     ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xca/0x140 [ib_uverbs]
     ib_uverbs_run_method+0x6d0/0x780 [ib_uverbs]
     ? __pfx_ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x10/0x10 [ib_uverbs]
     ib_uverbs_cmd_verbs+0x19b/0x360 [ib_uverbs]
     ? walk_system_ram_range+0x79/0xd0
     ? ___pte_offset_map+0x1b/0x110
     ? __pte_offset_map_lock+0x80/0x100
     ib_uverbs_ioctl+0xac/0x110 [ib_uverbs]
     __x64_sys_ioctl+0x94/0xb0
     do_syscall_64+0x50/0x110
     entry_SYSCALL_64_after_hwframe+0x76/0x7e
    RIP: 0033:0x7fb7ecf0737b
    Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 7d 2a 0f 00 f7 d8 64 89 01 48
    RSP: 002b:00007ffdbe03ecc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
    RAX: ffffffffffffffda RBX: 00007ffdbe03edb8 RCX: 00007fb7ecf0737b
    RDX: 00007ffdbe03eda0 RSI: 00000000c0181b01 RDI: 0000000000000003
    RBP: 00007ffdbe03ed80 R08: 00007fb7ecc84010 R09: 00007ffdbe03eed4
    R10: 0000000000000009 R11: 0000000000000246 R12: 00007ffdbe03eed4
    R13: 000000000000000c R14: 000000000000000c R15: 00007fb7ecc84150
     </TASK>

    Fixes: cef7dde ("net/mlx5: Expand mkey page size to support 6 bits")
    Signed-off-by: Michael Guralnik <[email protected]>
    Reviewed-by: Yishai Hadas <[email protected]>
    Link: https://patch.msgid.link/2479a4a3f6fd9bd032e1b6d396274a89c4c5e22f.1741875692.git.leon@kernel.org
    Signed-off-by: Leon Romanovsky <[email protected]>

Signed-off-by: Benjamin Poirier <[email protected]>
PlaidCat added a commit that referenced this pull request Aug 25, 2025
jira LE-3677
cve CVE-2025-22091
Rebuild_History Non-Buildable kernel-6.12.0-55.24.1.el10_0
commit-author Michael Guralnik <[email protected]>
commit f0c2427

Change all variables storing mlx5_umem_mkc_find_best_pgsz() result to
unsigned long to support values larger than 31 and avoid overflow.

For example: If we try to register 4GB of memory that is contiguous in
physical memory, the driver will optimize the page_size and try to use
an mkey with 4GB entity size. The 'unsigned int' page_size variable will
overflow to '0' and we'll hit the WARN_ON() in alloc_cacheable_mr().

WARNING: CPU: 2 PID: 1203 at drivers/infiniband/hw/mlx5/mr.c:1124 alloc_cacheable_mr+0x22/0x580 [mlx5_ib]
Modules linked in: mlx5_ib mlx5_core bonding ip6_gre ip6_tunnel tunnel6 ip_gre gre rdma_rxe rdma_ucm ib_uverbs ib_ipoib ib_umad rpcrdma ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm fuse ib_core [last unloaded: mlx5_core]
CPU: 2 UID: 70878 PID: 1203 Comm: rdma_resource_l Tainted: G        W          6.14.0-rc4-dirty #43
Tainted: [W]=WARN
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:alloc_cacheable_mr+0x22/0x580 [mlx5_ib]
Code: 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 41 52 53 48 83 ec 30 f6 46 28 04 4c 8b 77 08 75 21 <0f> 0b 49 c7 c2 ea ff ff ff 48 8d 65 d0 4c 89 d0 5b 41 5a 41 5c 41
RSP: 0018:ffffc900006ffac8 EFLAGS: 00010246
RAX: 0000000004c0d0d0 RBX: ffff888217a22000 RCX: 0000000000100001
RDX: 00007fb7ac480000 RSI: ffff8882037b1240 RDI: ffff8882046f0600
RBP: ffffc900006ffb28 R08: 0000000000000001 R09: 0000000000000000
R10: 00000000000007e0 R11: ffffea0008011d40 R12: ffff8882037b1240
R13: ffff8882046f0600 R14: ffff888217a22000 R15: ffffc900006ffe00
FS:  00007fb7ed013340(0000) GS:ffff88885fd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fb7ed1d8000 CR3: 00000001fd8f6006 CR4: 0000000000772eb0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 <TASK>
 ? __warn+0x81/0x130
 ? alloc_cacheable_mr+0x22/0x580 [mlx5_ib]
 ? report_bug+0xfc/0x1e0
 ? handle_bug+0x55/0x90
 ? exc_invalid_op+0x17/0x70
 ? asm_exc_invalid_op+0x1a/0x20
 ? alloc_cacheable_mr+0x22/0x580 [mlx5_ib]
 create_real_mr+0x54/0x150 [mlx5_ib]
 ib_uverbs_reg_mr+0x17f/0x2a0 [ib_uverbs]
 ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xca/0x140 [ib_uverbs]
 ib_uverbs_run_method+0x6d0/0x780 [ib_uverbs]
 ? __pfx_ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x10/0x10 [ib_uverbs]
 ib_uverbs_cmd_verbs+0x19b/0x360 [ib_uverbs]
 ? walk_system_ram_range+0x79/0xd0
 ? ___pte_offset_map+0x1b/0x110
 ? __pte_offset_map_lock+0x80/0x100
 ib_uverbs_ioctl+0xac/0x110 [ib_uverbs]
 __x64_sys_ioctl+0x94/0xb0
 do_syscall_64+0x50/0x110
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7fb7ecf0737b
Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 7d 2a 0f 00 f7 d8 64 89 01 48
RSP: 002b:00007ffdbe03ecc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007ffdbe03edb8 RCX: 00007fb7ecf0737b
RDX: 00007ffdbe03eda0 RSI: 00000000c0181b01 RDI: 0000000000000003
RBP: 00007ffdbe03ed80 R08: 00007fb7ecc84010 R09: 00007ffdbe03eed4
R10: 0000000000000009 R11: 0000000000000246 R12: 00007ffdbe03eed4
R13: 000000000000000c R14: 000000000000000c R15: 00007fb7ecc84150
 </TASK>

Fixes: cef7dde ("net/mlx5: Expand mkey page size to support 6 bits")
	Signed-off-by: Michael Guralnik <[email protected]>
	Reviewed-by: Yishai Hadas <[email protected]>
Link: https://patch.msgid.link/2479a4a3f6fd9bd032e1b6d396274a89c4c5e22f.1741875692.git.leon@kernel.org
	Signed-off-by: Leon Romanovsky <[email protected]>
(cherry picked from commit f0c2427)
	Signed-off-by: Jonathan Maple <[email protected]>
github-actions bot pushed a commit that referenced this pull request Oct 9, 2025
JIRA: https://issues.redhat.com/browse/RHEL-79008
CVE: CVE-2024-56633

commit ca70b8b
Author: Zijian Zhang <[email protected]>
Date:   Wed Oct 16 23:48:38 2024 +0000

    tcp_bpf: Fix the sk_mem_uncharge logic in tcp_bpf_sendmsg

    The current sk memory accounting logic in __SK_REDIRECT is pre-uncharging
    tosend bytes, which is either msg->sg.size or a smaller value apply_bytes.

    Potential problems with this strategy are as follows:

    - If the actual sent bytes are smaller than tosend, we need to charge some
      bytes back, as in line 487, which is okay but seems not clean.

    - When tosend is set to apply_bytes, as in line 417, and (ret < 0), we may
      miss uncharging (msg->sg.size - apply_bytes) bytes.

    [...]
    415 tosend = msg->sg.size;
    416 if (psock->apply_bytes && psock->apply_bytes < tosend)
    417   tosend = psock->apply_bytes;
    [...]
    443 sk_msg_return(sk, msg, tosend);
    444 release_sock(sk);
    446 origsize = msg->sg.size;
    447 ret = tcp_bpf_sendmsg_redir(sk_redir, redir_ingress,
    448                             msg, tosend, flags);
    449 sent = origsize - msg->sg.size;
    [...]
    454 lock_sock(sk);
    455 if (unlikely(ret < 0)) {
    456   int free = sk_msg_free_nocharge(sk, msg);
    458   if (!cork)
    459     *copied -= free;
    460 }
    [...]
    487 if (eval == __SK_REDIRECT)
    488   sk_mem_charge(sk, tosend - sent);
    [...]

    When running the selftest test_txmsg_redir_wait_sndmem with txmsg_apply,
    the following warning will be reported:

    ------------[ cut here ]------------
    WARNING: CPU: 6 PID: 57 at net/ipv4/af_inet.c:156 inet_sock_destruct+0x190/0x1a0
    Modules linked in:
    CPU: 6 UID: 0 PID: 57 Comm: kworker/6:0 Not tainted 6.12.0-rc1.bm.1-amd64+ #43
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
    Workqueue: events sk_psock_destroy
    RIP: 0010:inet_sock_destruct+0x190/0x1a0
    RSP: 0018:ffffad0a8021fe08 EFLAGS: 00010206
    RAX: 0000000000000011 RBX: ffff9aab4475b900 RCX: ffff9aab481a0800
    RDX: 0000000000000303 RSI: 0000000000000011 RDI: ffff9aab4475b900
    RBP: ffff9aab4475b990 R08: 0000000000000000 R09: ffff9aab40050ec0
    R10: 0000000000000000 R11: ffff9aae6fdb1d01 R12: ffff9aab49c60400
    R13: ffff9aab49c60598 R14: ffff9aab49c60598 R15: dead000000000100
    FS:  0000000000000000(0000) GS:ffff9aae6fd80000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007ffec7e47bd8 CR3: 00000001a1a1c004 CR4: 0000000000770ef0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    PKRU: 55555554
    Call Trace:
    <TASK>
    ? __warn+0x89/0x130
    ? inet_sock_destruct+0x190/0x1a0
    ? report_bug+0xfc/0x1e0
    ? handle_bug+0x5c/0xa0
    ? exc_invalid_op+0x17/0x70
    ? asm_exc_invalid_op+0x1a/0x20
    ? inet_sock_destruct+0x190/0x1a0
    __sk_destruct+0x25/0x220
    sk_psock_destroy+0x2b2/0x310
    process_scheduled_works+0xa3/0x3e0
    worker_thread+0x117/0x240
    ? __pfx_worker_thread+0x10/0x10
    kthread+0xcf/0x100
    ? __pfx_kthread+0x10/0x10
    ret_from_fork+0x31/0x40
    ? __pfx_kthread+0x10/0x10
    ret_from_fork_asm+0x1a/0x30
    </TASK>
    ---[ end trace 0000000000000000 ]---

    In __SK_REDIRECT, a more concise way is delaying the uncharging after sent
    bytes are finalized, and uncharge this value. When (ret < 0), we shall
    invoke sk_msg_free.

    Same thing happens in case __SK_DROP, when tosend is set to apply_bytes,
    we may miss uncharging (msg->sg.size - apply_bytes) bytes. The same
    warning will be reported in selftest.

    [...]
    468 case __SK_DROP:
    469 default:
    470 sk_msg_free_partial(sk, msg, tosend);
    471 sk_msg_apply_bytes(psock, tosend);
    472 *copied -= (tosend + delta);
    473 return -EACCES;
    [...]

    So instead of sk_msg_free_partial we can do sk_msg_free here.

    Fixes: 604326b ("bpf, sockmap: convert to generic sk_msg interface")
    Fixes: 8ec95b9 ("bpf, sockmap: Fix the sk->sk_forward_alloc warning of sk_stream_kill_queues")
    Signed-off-by: Zijian Zhang <[email protected]>
    Signed-off-by: Daniel Borkmann <[email protected]>
    Acked-by: John Fastabend <[email protected]>
    Link: https://lore.kernel.org/bpf/[email protected]

Signed-off-by: CKI Backport Bot <[email protected]>
github-actions bot pushed a commit that referenced this pull request Oct 9, 2025
…bpf_sendmsg

MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-10/-/merge_requests/371

JIRA: https://issues.redhat.com/browse/RHEL-79008
CVE: CVE-2024-56633

```
commit ca70b8b
Author: Zijian Zhang <[email protected]>
Date:   Wed Oct 16 23:48:38 2024 +0000

    tcp_bpf: Fix the sk_mem_uncharge logic in tcp_bpf_sendmsg

    The current sk memory accounting logic in __SK_REDIRECT is pre-uncharging
    tosend bytes, which is either msg->sg.size or a smaller value apply_bytes.

    Potential problems with this strategy are as follows:

    - If the actual sent bytes are smaller than tosend, we need to charge some
      bytes back, as in line 487, which is okay but seems not clean.

    - When tosend is set to apply_bytes, as in line 417, and (ret < 0), we may
      miss uncharging (msg->sg.size - apply_bytes) bytes.

    [...]
    415 tosend = msg->sg.size;
    416 if (psock->apply_bytes && psock->apply_bytes < tosend)
    417   tosend = psock->apply_bytes;
    [...]
    443 sk_msg_return(sk, msg, tosend);
    444 release_sock(sk);
    446 origsize = msg->sg.size;
    447 ret = tcp_bpf_sendmsg_redir(sk_redir, redir_ingress,
    448                             msg, tosend, flags);
    449 sent = origsize - msg->sg.size;
    [...]
    454 lock_sock(sk);
    455 if (unlikely(ret < 0)) {
    456   int free = sk_msg_free_nocharge(sk, msg);
    458   if (!cork)
    459     *copied -= free;
    460 }
    [...]
    487 if (eval == __SK_REDIRECT)
    488   sk_mem_charge(sk, tosend - sent);
    [...]

    When running the selftest test_txmsg_redir_wait_sndmem with txmsg_apply,
    the following warning will be reported:

    ------------[ cut here ]------------
    WARNING: CPU: 6 PID: 57 at net/ipv4/af_inet.c:156 inet_sock_destruct+0x190/0x1a0
    Modules linked in:
    CPU: 6 UID: 0 PID: 57 Comm: kworker/6:0 Not tainted 6.12.0-rc1.bm.1-amd64+ #43
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
    Workqueue: events sk_psock_destroy
    RIP: 0010:inet_sock_destruct+0x190/0x1a0
    RSP: 0018:ffffad0a8021fe08 EFLAGS: 00010206
    RAX: 0000000000000011 RBX: ffff9aab4475b900 RCX: ffff9aab481a0800
    RDX: 0000000000000303 RSI: 0000000000000011 RDI: ffff9aab4475b900
    RBP: ffff9aab4475b990 R08: 0000000000000000 R09: ffff9aab40050ec0
    R10: 0000000000000000 R11: ffff9aae6fdb1d01 R12: ffff9aab49c60400
    R13: ffff9aab49c60598 R14: ffff9aab49c60598 R15: dead000000000100
    FS:  0000000000000000(0000) GS:ffff9aae6fd80000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007ffec7e47bd8 CR3: 00000001a1a1c004 CR4: 0000000000770ef0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    PKRU: 55555554
    Call Trace:
    <TASK>
    ? __warn+0x89/0x130
    ? inet_sock_destruct+0x190/0x1a0
    ? report_bug+0xfc/0x1e0
    ? handle_bug+0x5c/0xa0
    ? exc_invalid_op+0x17/0x70
    ? asm_exc_invalid_op+0x1a/0x20
    ? inet_sock_destruct+0x190/0x1a0
    __sk_destruct+0x25/0x220
    sk_psock_destroy+0x2b2/0x310
    process_scheduled_works+0xa3/0x3e0
    worker_thread+0x117/0x240
    ? __pfx_worker_thread+0x10/0x10
    kthread+0xcf/0x100
    ? __pfx_kthread+0x10/0x10
    ret_from_fork+0x31/0x40
    ? __pfx_kthread+0x10/0x10
    ret_from_fork_asm+0x1a/0x30
    </TASK>
    ---[ end trace 0000000000000000 ]---

    In __SK_REDIRECT, a more concise way is delaying the uncharging after sent
    bytes are finalized, and uncharge this value. When (ret < 0), we shall
    invoke sk_msg_free.

    Same thing happens in case __SK_DROP, when tosend is set to apply_bytes,
    we may miss uncharging (msg->sg.size - apply_bytes) bytes. The same
    warning will be reported in selftest.

    [...]
    468 case __SK_DROP:
    469 default:
    470 sk_msg_free_partial(sk, msg, tosend);
    471 sk_msg_apply_bytes(psock, tosend);
    472 *copied -= (tosend + delta);
    473 return -EACCES;
    [...]

    So instead of sk_msg_free_partial we can do sk_msg_free here.

    Fixes: 604326b ("bpf, sockmap: convert to generic sk_msg interface")
    Fixes: 8ec95b9 ("bpf, sockmap: Fix the sk->sk_forward_alloc warning of sk_stream_kill_queues")
    Signed-off-by: Zijian Zhang <[email protected]>
    Signed-off-by: Daniel Borkmann <[email protected]>
    Acked-by: John Fastabend <[email protected]>
    Link: https://lore.kernel.org/bpf/[email protected]```

Signed-off-by: CKI Backport Bot <[email protected]>

---

<small>Created 2025-02-12 08:39 UTC by backporter - [KWF FAQ](https://red.ht/kernel_workflow_doc) - [Slack #team-kernel-workflow](https://redhat-internal.slack.com/archives/C04LRUPMJQ5) - [Source](https://gitlab.com/cki-project/kernel-workflow/-/blob/main/webhook/utils/backporter.py) - [Documentation](https://gitlab.com/cki-project/kernel-workflow/-/blob/main/docs/README.backporter.md) - [Report an issue](https://gitlab.com/cki-project/kernel-workflow/-/issues/new?issue%5Btitle%5D=backporter%20webhook%20issue)</small>

Approved-by: Florian Westphal <[email protected]>
Approved-by: Guillaume Nault <[email protected]>
Approved-by: CKI KWF Bot <[email protected]>

Merged-by: CKI GitLab Kmaint Pipeline Bot <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants