Skip to content

Conversation

pvts-mat
Copy link
Contributor

[LTS 8.6]
CVE-2025-21927
VULN-56023

Problem

https://www.cve.org/CVERecord?id=CVE-2025-21927

In the Linux kernel, the following vulnerability has been resolved: nvme-tcp: fix potential memory corruption in nvme_tcp_recv_pdu() nvme_tcp_recv_pdu() doesn't check the validity of the header length. When header digests are enabled, a target might send a packet with an invalid header length (e.g. 255), causing nvme_tcp_verify_hdgst() to access memory outside the allocated area and cause memory corruptions by overwriting it with the calculated digest. Fix this by rejecting packets with an unexpected header length.

Analysis and Solution (same as for ciqlts8_8)

Context

NVME (Non-Volatile Memory Express) is a communication protocol designed for accessing high-speed storage media, particularly solid-state drives (SSDs). NVMe over Fabrics (NVMe-oF) is an extension of the NVMe protocol that allows NVMe commands to be sent over a network fabric, enabling remote access to NVMe storage devices.

The "target" mentioned in CVE description is the host providing access to the local NVME device (the server). The host importing the remote NVME device is called simply a "host", or "initiator" (the client). The module implementing NVMe-oF on target's side is nvmet-tcp, on the initiator's side it's nvme-tcp - the subject of this patch.

Applicability

All the key options related to NVMe-oF, specifically CONFIG_NVME_TCP enabling the nvme-tcp module, are enabled in ciqlts8_6. Per .config file created from configs/kernel-x86_64.config:

#
# NVME Support
#
CONFIG_NVME_CORE=m
CONFIG_BLK_DEV_NVME=m
CONFIG_NVME_MULTIPATH=y
# CONFIG_NVME_HWMON is not set
CONFIG_NVME_FABRICS=m
CONFIG_NVME_RDMA=m
CONFIG_NVME_FC=m
CONFIG_NVME_TCP=m
CONFIG_NVME_TARGET=m
# CONFIG_NVME_TARGET_PASSTHRU is not set
CONFIG_NVME_TARGET_LOOP=m
CONFIG_NVME_TARGET_RDMA=m
CONFIG_NVME_TARGET_FC=m
CONFIG_NVME_TARGET_FCLOOP=m
CONFIG_NVME_TARGET_TCP=m

Solution

The solution in the mainline kernel is provided in the ad95bab commit. It was not backported to any stable kernel older than 6.12.

Naive cherry-picking results in conflicts with git's attempt to introduce additional functions (nvme_tcp_tls_configured, nvme_tcp_queue_tls) and code branches (nvme_tcp_c2h_term packet type check in the nvme_tcp_recv_pdu function) introduced in more recent versions of the module but not related to the bug fix.

A small change was made to the nvme_tcp_recv_pdu_supported function introduced in the official fix ad95bab for the sake of nvme_tcp_recv_pdu's behavior consistency between the scenarios of receiving a packet with a proper and an improper header - the removal of the nvme_tcp_c2h_term case.

Consider the behavior cases in case a packet with a proper header was received:

  Packet type: X ∈ {c2h_term} X ∈ {c2h_data, rsp, r2t} X ∉ {c2h_term, c2h_data, rsp, r2t}
a Mainline, after patch (ad95bab) nvme_tcp_handle_X nvme_tcp_handle_X "unsupported pdu type …", -EINVAL
b ciqlts8_6, after patch, c2h_term included "unsupported pdu type …", -EINVAL nvme_tcp_handle_X "unsupported pdu type …", -EINVAL
c ciqlts8_6, after patch, c2h_term excluded "unsupported pdu type …", -EINVAL nvme_tcp_handle_X "unsupported pdu type …", -EINVAL

Then in case a packet with an improper header was received:

  Packet type: X ∈ {c2h_term} X ∈ {c2h_data, rsp, r2t} X ∉ {c2h_term, c2h_data, rsp, r2t}
x Mainline, after patch (ad95bab) "pdu type %d has unexpected header length", -EPROTO "pdu type %d has unexpected header length", -EPROTO "unsupported pdu type …", -EINVAL
y ciqlts8_6, after patch, c2h_term included "pdu type %d has unexpected header length", -EPROTO "pdu type %d has unexpected header length", -EPROTO "unsupported pdu type …", -EINVAL
z ciqlts8_6, after patch, c2h_term excluded "unsupported pdu type …", -EINVAL "pdu type %d has unexpected header length", -EPROTO "unsupported pdu type …", -EINVAL

Solution a is to x not as b is to y, but as c is to z, thus the c, z pair was chosen.

kABI check: passed

DEBUG=1 RELAXED_DEPS=1 CVE=CVE-2025-21927 ./ninja.sh _kabi_checked__x86_64--test--ciqlts8_6-CVE-2025-21927

[0/1] Check ABI of kernel [ciqlts8_6-CVE-2025-21927]
++ uname -m
+ python3 /data/src/ctrliq-github/kernel-dist-git-el-8.6/SOURCES/check-kabi -k /data/src/ctrliq-github/kernel-dist-git-el-8.6/SOURCES/Module.kabi_x86_64 -s vms/x86_64--build--ciqlts8_6/build_files/kernel-src-tree-ciqlts8_6-CVE-2025-21927/Module.symvers
kABI check passed
+ touch state/kernels/ciqlts8_6-CVE-2025-21927/x86_64/kabi_checked

Boot test: passed

boot-test.log

Kselftests

Methodology

The selftests were source-compiled from the recent ciqlts8_6 branch (commit 00366e2).

The tests were run using an explicit list which omitted certain tests known to give inconsistent results. Details in the src/run-kselftests.sh script of the rocky-patching project.

Coverage

android, bpf (except test_kmod.sh, test_progs, test_progs-no_alu32, test_sockmap, test_xsk.sh), breakpoints, capabilities, core, cpu-hotplug, cpufreq, efivarfs, exec, firmware, fpu, ftrace, futex, gpio, intel_pstate, ipc, kcmp, kexec, kvm, lib, livepatch, membarrier, memfd, memory-hotplug, mount, net (except gro.sh, ip_defrag.sh, reuseaddr_conflict, txtimestamp.sh, udpgso_bench.sh), net/forwarding (except ipip_hier_gre_keys.sh, sch_ets.sh, sch_tbf_ets.sh, sch_tbf_prio.sh, sch_tbf_root.sh, tc_actions.sh), net/mptcp, netfilter (except nft_trans_stress.sh), nsfs, proc, pstore, ptrace, rseq, sgx, sigaltstack, size, splice, static_keys, sync, sysctl, tc-testing, timens, timers (except raw_skew), tpm2, user, vm, x86, zram.

Reference

The results are provided in two parts, because the first batch run was ending prematurely and missing around 1/3 of tests.

kselftests–ciqlts8_6–part1–run1.log
kselftests–ciqlts8_6–part1–run2.log
kselftests–ciqlts8_6–part1–run3.log
kselftests–ciqlts8_6–part2–run1.log
kselftests–ciqlts8_6–part2–run2.log
kselftests–ciqlts8_6–part2–run3.log

Patch

The results are provided in two parts, because the first batch run was ending prematurely and missing around 1/3 of tests.

kselftests–ciqlts8_6-CVE-2025-21927–part1–run1.log
kselftests–ciqlts8_6-CVE-2025-21927–part2–run1.log

Comparison

The results are the same in the reference and patched kernel.

./ktests.xsh diff -d  kselftests*.log

Column    File
--------  -----------------------------------------------------
Status0   kselftests--ciqlts8_6--part1--run1.log
Status1   kselftests--ciqlts8_6--part1--run2.log
Status2   kselftests--ciqlts8_6--part1--run3.log
Status3   kselftests--ciqlts8_6--part2--run1.log
Status4   kselftests--ciqlts8_6--part2--run2.log
Status5   kselftests--ciqlts8_6--part2--run3.log
Status6   kselftests--ciqlts8_6-CVE-2025-21927--part1--run1.log
Status7   kselftests--ciqlts8_6-CVE-2025-21927--part2--run1.log

Specific tests: suspended

See the situation for #234.

jira VULN-56023
cve CVE-2025-21927
commit-author Maurizio Lombardi <[email protected]>
commit ad95bab
upstream-diff Removed `nvme_tcp_c2h_term' case from
              `nvme_tcp_recv_pdu_supported' for the sake of consistency of
              `nvme_tcp_recv_pdu''s behavior relative to the upstream
              version, between the cases of proper and improper
              header. (What could be considered as "`c2h_term' type support"
              started with 84e0090 commit,
              not included in `ciqlts8_6''s history, so
              `nvme_tcp_recv_pdu_supported' in `ciqlts8_6' shouldn't report
              the `nvme_tcp_c2h_term' type as supported.)

nvme_tcp_recv_pdu() doesn't check the validity of the header length.
When header digests are enabled, a target might send a packet with an
invalid header length (e.g. 255), causing nvme_tcp_verify_hdgst()
to access memory outside the allocated area and cause memory corruptions
by overwriting it with the calculated digest.

Fix this by rejecting packets with an unexpected header length.

Fixes: 3f2304f ("nvme-tcp: add NVMe over TCP host driver")
	Signed-off-by: Maurizio Lombardi <[email protected]>
	Reviewed-by: Sagi Grimberg <[email protected]>
	Signed-off-by: Keith Busch <[email protected]>
(cherry picked from commit ad95bab)
	Signed-off-by: Marcin Wcisło <[email protected]>
@PlaidCat PlaidCat changed the title nvme-tcp: fix potential memory corruption in nvme_tcp_recv_pdu() [LTS 8.6] nvme-tcp: fix potential memory corruption in nvme_tcp_recv_pdu() Apr 30, 2025
Copy link
Collaborator

@bmastbergen bmastbergen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥌

Copy link
Collaborator

@PlaidCat PlaidCat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@PlaidCat PlaidCat merged commit a62768f into ctrliq:ciqlts8_6 May 1, 2025
2 checks passed
github-actions bot pushed a commit that referenced this pull request May 7, 2025
[BUG]
When trying read-only scrub on a btrfs with rescue=idatacsums mount
option, it will crash with the following call trace:

  BUG: kernel NULL pointer dereference, address: 0000000000000208
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  CPU: 1 UID: 0 PID: 835 Comm: btrfs Tainted: G           O        6.15.0-rc3-custom+ #236 PREEMPT(full)
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 02/02/2022
  RIP: 0010:btrfs_lookup_csums_bitmap+0x49/0x480 [btrfs]
  Call Trace:
   <TASK>
   scrub_find_fill_first_stripe+0x35b/0x3d0 [btrfs]
   scrub_simple_mirror+0x175/0x290 [btrfs]
   scrub_stripe+0x5f7/0x6f0 [btrfs]
   scrub_chunk+0x9a/0x150 [btrfs]
   scrub_enumerate_chunks+0x333/0x660 [btrfs]
   btrfs_scrub_dev+0x23e/0x600 [btrfs]
   btrfs_ioctl+0x1dcf/0x2f80 [btrfs]
   __x64_sys_ioctl+0x97/0xc0
   do_syscall_64+0x4f/0x120
   entry_SYSCALL_64_after_hwframe+0x76/0x7e

[CAUSE]
Mount option "rescue=idatacsums" will completely skip loading the csum
tree, so that any data read will not find any data csum thus we will
ignore data checksum verification.

Normally call sites utilizing csum tree will check the fs state flag
NO_DATA_CSUMS bit, but unfortunately scrub does not check that bit at all.

This results in scrub to call btrfs_search_slot() on a NULL pointer
and triggered above crash.

[FIX]
Check both extent and csum tree root before doing any tree search.

Reviewed-by: Johannes Thumshirn <[email protected]>
Signed-off-by: Qu Wenruo <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
github-actions bot pushed a commit that referenced this pull request May 30, 2025
[ Upstream commit f95d186 ]

[BUG]
When trying read-only scrub on a btrfs with rescue=idatacsums mount
option, it will crash with the following call trace:

  BUG: kernel NULL pointer dereference, address: 0000000000000208
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  CPU: 1 UID: 0 PID: 835 Comm: btrfs Tainted: G           O        6.15.0-rc3-custom+ #236 PREEMPT(full)
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 02/02/2022
  RIP: 0010:btrfs_lookup_csums_bitmap+0x49/0x480 [btrfs]
  Call Trace:
   <TASK>
   scrub_find_fill_first_stripe+0x35b/0x3d0 [btrfs]
   scrub_simple_mirror+0x175/0x290 [btrfs]
   scrub_stripe+0x5f7/0x6f0 [btrfs]
   scrub_chunk+0x9a/0x150 [btrfs]
   scrub_enumerate_chunks+0x333/0x660 [btrfs]
   btrfs_scrub_dev+0x23e/0x600 [btrfs]
   btrfs_ioctl+0x1dcf/0x2f80 [btrfs]
   __x64_sys_ioctl+0x97/0xc0
   do_syscall_64+0x4f/0x120
   entry_SYSCALL_64_after_hwframe+0x76/0x7e

[CAUSE]
Mount option "rescue=idatacsums" will completely skip loading the csum
tree, so that any data read will not find any data csum thus we will
ignore data checksum verification.

Normally call sites utilizing csum tree will check the fs state flag
NO_DATA_CSUMS bit, but unfortunately scrub does not check that bit at all.

This results in scrub to call btrfs_search_slot() on a NULL pointer
and triggered above crash.

[FIX]
Check both extent and csum tree root before doing any tree search.

Reviewed-by: Johannes Thumshirn <[email protected]>
Signed-off-by: Qu Wenruo <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
bmastbergen pushed a commit to bmastbergen/kernel-src-tree that referenced this pull request Aug 29, 2025
jira LE-1907
Rebuild_History Non-Buildable kernel-5.14.0-427.18.1.el9_4
commit-author Daniel Borkmann <[email protected]>
commit ccd9a8b

Add several new tcx test cases to improve test coverage. This also includes
a few new tests with ingress instead of clsact qdisc, to cover the fix from
commit dc644b5 ("tcx: Fix splat in ingress_destroy upon tcx_entry_free").

  # ./test_progs -t tc
  [...]
  ctrliq#234     tc_links_after:OK
  ctrliq#235     tc_links_append:OK
  ctrliq#236     tc_links_basic:OK
  ctrliq#237     tc_links_before:OK
  ctrliq#238     tc_links_chain_classic:OK
  ctrliq#239     tc_links_chain_mixed:OK
  ctrliq#240     tc_links_dev_cleanup:OK
  ctrliq#241     tc_links_dev_mixed:OK
  ctrliq#242     tc_links_ingress:OK
  ctrliq#243     tc_links_invalid:OK
  ctrliq#244     tc_links_prepend:OK
  ctrliq#245     tc_links_replace:OK
  ctrliq#246     tc_links_revision:OK
  ctrliq#247     tc_opts_after:OK
  ctrliq#248     tc_opts_append:OK
  ctrliq#249     tc_opts_basic:OK
  ctrliq#250     tc_opts_before:OK
  ctrliq#251     tc_opts_chain_classic:OK
  ctrliq#252     tc_opts_chain_mixed:OK
  ctrliq#253     tc_opts_delete_empty:OK
  ctrliq#254     tc_opts_demixed:OK
  ctrliq#255     tc_opts_detach:OK
  ctrliq#256     tc_opts_detach_after:OK
  ctrliq#257     tc_opts_detach_before:OK
  ctrliq#258     tc_opts_dev_cleanup:OK
  ctrliq#259     tc_opts_invalid:OK
  ctrliq#260     tc_opts_mixed:OK
  ctrliq#261     tc_opts_prepend:OK
  ctrliq#262     tc_opts_replace:OK
  ctrliq#263     tc_opts_revision:OK
  [...]
  Summary: 44/38 PASSED, 0 SKIPPED, 0 FAILED

	Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/r/8699efc284b75ccdc51ddf7062fa2370330dc6c0.1692029283.git.daniel@iogearbox.net
	Signed-off-by: Martin KaFai Lau <[email protected]>
(cherry picked from commit ccd9a8b)
	Signed-off-by: Jonathan Maple <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants