Skip to content

Conversation

PlaidCat
Copy link
Collaborator

@PlaidCat PlaidCat commented Apr 8, 2025

T
General Process:

Contains the following: http://download.rockylinux.org/pub/rocky/8.10/BaseOS/source/tree/Packages/k/

Checking Rebuild Commits for potentially missing commits:

The only one that stood out was this one:
net: skb: exclude the single page frag cache for too small alloc but a search does not turn up anything and the 87.5% fuzzy string matching should have found it if it exists upstream. It will be included in the splat though and the important thing is there does not appear to be any other non-upstream commits from the Red Hat upstream.
https://github.com/search?q=repo%3Atorvalds%2Flinux+%22net%3A+skb%3A+exclude+the+single+page+frag+cache+for+too+small+alloc%22&type=commits

$ ls ciq/ciq_backports/kernel-4.18.0-553.4*/rebuild.details.txt | while read line; do echo $line; cat $line; echo ""; echo ""; done
ciq/ciq_backports/kernel-4.18.0-553.40.1.el8_10/rebuild.details.txt
Rebuild_History BUILDABLE
Rebuilding Kernel from rpm changelog with Fuzz Limit: 87.50%
Number of commits in upstream range v4.18~1..master: 524209
Number of commits in rpm: 17
Number of commits matched with upstream: 11 (64.71%)
Number of commits in upstream but not in rpm: 524198
Number of commits NOT found in upstream: 6 (35.29%)

Rebuilding Kernel on Branch rocky8_10_rebuild_kernel-4.18.0-553.40.1.el8_10 for kernel-4.18.0-553.40.1.el8_10
Clean Cherry Picks: 5 (45.45%)
Empty Cherry Picks: 6 (54.55%)
_______________________________

__EMPTY COMMITS__________________________
0467cdde8c4320bbfdb31a8cff1277b202f677fc s390/pci: Sort PCI functions prior to creating virtual busses
126034faaac5f356822c4a9bebfa75664da11056 s390/pci: Use topology ID for multi-function devices
25f39d3dcb48bbc824a77d16b3d977f0f3713cfe s390/pci: Ignore RID for isolated VFs
48796104c864cf4dafa80bd8c2ce88f9c92a65ea s390/pci: Fix leak of struct zpci_dev when zpci_add_device() fails
5fd11b96b43708f2f6e3964412c301c1bd20ec0f s390/pci: Refactor arch_setup_msi_irqs()
ab42fcb511fd9d241bbab7cc3ca04e34e9fc0666 s390/pci: Allow allocation of more than 1 MSI interrupt

__CHANGES NOT IN UPSTREAM________________
Adding prod certs and changed cert date to 20210620
Adding Rocky secure boot certs
Fixing vmlinuz removal
Fixing UEFI CA path
Porting to 8.10, debranding and Rocky branding
Fixing pesign_key_name values


ciq/ciq_backports/kernel-4.18.0-553.42.1.el8_10/rebuild.details.txt
Rebuild_History BUILDABLE
Rebuilding Kernel from rpm changelog with Fuzz Limit: 87.50%
Number of commits in upstream range v4.18~1..master: 538207
Number of commits in rpm: 14
Number of commits matched with upstream: 7 (50.00%)
Number of commits in upstream but not in rpm: 538200
Number of commits NOT found in upstream: 7 (50.00%)

Rebuilding Kernel on Branch rocky8_10_rebuild_kernel-4.18.0-553.42.1.el8_10 for kernel-4.18.0-553.42.1.el8_10
Clean Cherry Picks: 6 (85.71%)
Empty Cherry Picks: 1 (14.29%)
_______________________________

__EMPTY COMMITS__________________________
98b37881b7492ae9048ad48260cc8a6ee9eb39fd scsi: st: Don't set pos_unknown just after device recognition

__CHANGES NOT IN UPSTREAM________________
Adding prod certs and changed cert date to 20210620
Adding Rocky secure boot certs
Fixing vmlinuz removal
Fixing UEFI CA path
Porting to 8.10, debranding and Rocky branding
Fixing pesign_key_name values
net: skb: exclude the single page frag cache for too small alloc


ciq/ciq_backports/kernel-4.18.0-553.44.1.el8_10/rebuild.details.txt
Rebuild_History BUILDABLE
Rebuilding Kernel from rpm changelog with Fuzz Limit: 87.50%
Number of commits in upstream range v4.18~1..master: 538207
Number of commits in rpm: 24
Number of commits matched with upstream: 18 (75.00%)
Number of commits in upstream but not in rpm: 538189
Number of commits NOT found in upstream: 6 (25.00%)

Rebuilding Kernel on Branch rocky8_10_rebuild_kernel-4.18.0-553.44.1.el8_10 for kernel-4.18.0-553.44.1.el8_10
Clean Cherry Picks: 14 (77.78%)
Empty Cherry Picks: 4 (22.22%)
_______________________________

__EMPTY COMMITS__________________________
72ed5d5624af384eaf74d84915810d54486a75e2 net/mlx5: Suspend auxiliary devices only in case of PCI device suspend
aab8e1a200b926147db51e3f82fd07bb9edf6a98 net/mlx5: Reload auxiliary devices in pci error handlers
c79a39dc8d060b9e64e8b0fa9d245d44befeefbe pps: Fix a use-after-free
415d832497098030241605c52ea83d4e2cfa7879 locking/atomic: Make test_and_*_bit() ordered on failure

__CHANGES NOT IN UPSTREAM________________
Adding prod certs and changed cert date to 20210620
Adding Rocky secure boot certs
Fixing vmlinuz removal
Fixing UEFI CA path
Porting to 8.10, debranding and Rocky branding
Fixing pesign_key_name values


ciq/ciq_backports/kernel-4.18.0-553.45.1.el8_10/rebuild.details.txt
Rebuild_History BUILDABLE
Rebuilding Kernel from rpm changelog with Fuzz Limit: 87.50%
Number of commits in upstream range v4.18~1..master: 538207
Number of commits in rpm: 19
Number of commits matched with upstream: 13 (68.42%)
Number of commits in upstream but not in rpm: 538194
Number of commits NOT found in upstream: 6 (31.58%)

Rebuilding Kernel on Branch rocky8_10_rebuild_kernel-4.18.0-553.45.1.el8_10 for kernel-4.18.0-553.45.1.el8_10
Clean Cherry Picks: 11 (84.62%)
Empty Cherry Picks: 2 (15.38%)
_______________________________

__EMPTY COMMITS__________________________
6cf9ff463317217d95732a6cce6fbdd12508921a net: smc: fix spurious error message from __sock_release()
ba0925c34e0fa6fe02d3d642bc02ab099ab312c7 gve: process XSK TX descriptors as part of RX NAPI

__CHANGES NOT IN UPSTREAM________________
Adding prod certs and changed cert date to 20210620
Adding Rocky secure boot certs
Fixing vmlinuz removal
Fixing UEFI CA path
Porting to 8.10, debranding and Rocky branding
Fixing pesign_key_name values


ciq/ciq_backports/kernel-4.18.0-553.46.1.el8_10/rebuild.details.txt
Rebuild_History BUILDABLE
Rebuilding Kernel from rpm changelog with Fuzz Limit: 87.50%
Number of commits in upstream range v4.18~1..master: 538207
Number of commits in rpm: 12
Number of commits matched with upstream: 6 (50.00%)
Number of commits in upstream but not in rpm: 538201
Number of commits NOT found in upstream: 6 (50.00%)

Rebuilding Kernel on Branch rocky8_10_rebuild_kernel-4.18.0-553.46.1.el8_10 for kernel-4.18.0-553.46.1.el8_10
Clean Cherry Picks: 6 (100.00%)
Empty Cherry Picks: 0 (0.00%)
_______________________________

__EMPTY COMMITS__________________________

__CHANGES NOT IN UPSTREAM________________
Adding prod certs and changed cert date to 20210620
Adding Rocky secure boot certs
Fixing vmlinuz removal
Fixing UEFI CA path
Porting to 8.10, debranding and Rocky branding
Fixing pesign_key_name values


ciq/ciq_backports/kernel-4.18.0-553.47.1.el8_10/rebuild.details.txt
Rebuild_History BUILDABLE
Rebuilding Kernel from rpm changelog with Fuzz Limit: 87.50%
Number of commits in upstream range v4.18~1..master: 538207
Number of commits in rpm: 9
Number of commits matched with upstream: 3 (33.33%)
Number of commits in upstream but not in rpm: 538204
Number of commits NOT found in upstream: 6 (66.67%)

Rebuilding Kernel on Branch rocky8_10_rebuild_kernel-4.18.0-553.47.1.el8_10 for kernel-4.18.0-553.47.1.el8_10
Clean Cherry Picks: 1 (33.33%)
Empty Cherry Picks: 2 (66.67%)
_______________________________

__EMPTY COMMITS__________________________
8b62645b09f870d70c7910e7550289d444239a46 bpf: Use raw_spinlock_t in ringbuf
f32a213765739f2a1db319346799f130a3d08820 ethtool: runtime-resume netdev parent before ethtool ioctl ops

__CHANGES NOT IN UPSTREAM________________
Adding prod certs and changed cert date to 20210620
Adding Rocky secure boot certs
Fixing vmlinuz removal
Fixing UEFI CA path
Porting to 8.10, debranding and Rocky branding
Fixing pesign_key_name values

BUILD

/mnt/code/kernel-src-tree
no .config file found, moving on
[TIMER]{MRPROPER}: 0s
x86_64 architecture detected, copying config
'configs/kernel-4.18.0-x86_64.config' -> '.config'
Setting Local Version for build
CONFIG_LOCALVERSION="-rocky8_10_rebuild-01aef32f4a9b"
Making olddefconfig
  HOSTCC  scripts/basic/fixdep
  HOSTCC  scripts/kconfig/conf.o
  HOSTCC  scripts/kconfig/zconf.tab.o
  HOSTLD  scripts/kconfig/conf
scripts/kconfig/conf  --olddefconfig Kconfig
#
# configuration written to .config
#
Starting Build
scripts/kconfig/conf  --syncconfig Kconfig
  SYSTBL  arch/x86/include/generated/asm/syscalls_32.h
  SYSHDR  arch/x86/include/generated/asm/unistd_32_ia32.h

[SNIP]

  LD [M]  sound/xen/snd_xen_front.ko
  LD [M]  virt/lib/irqbypass.ko
[TIMER]{BUILD}: 2123s
Making Modules
  INSTALL arch/x86/crypto/blowfish-x86_64.ko
  INSTALL arch/x86/crypto/camellia-aesni-avx-x86_64.ko

[SNIP]

  INSTALL virt/lib/irqbypass.ko
  DEPMOD  4.18.0-rocky8_10_rebuild-01aef32f4a9b+
[TIMER]{MODULES}: 16s
Making Install
sh ./arch/x86/boot/install.sh 4.18.0-rocky8_10_rebuild-01aef32f4a9b+ arch/x86/boot/bzImage \
        System.map "/boot"
[TIMER]{INSTALL}: 22s
Checking kABI
kABI check passed
Setting Default Kernel to /boot/vmlinuz-4.18.0-rocky8_10_rebuild-01aef32f4a9b+ and Index to 0
Hopefully Grub2.0 took everything ... rebooting after time metrices
[TIMER]{MRPROPER}: 0s
[TIMER]{BUILD}: 2123s
[TIMER]{MODULES}: 16s
[TIMER]{INSTALL}: 22s
[TIMER]{TOTAL} 2167s
Rebooting in 10 seconds

Boot

[maple@r8-sigcloud-builder code]$ uname -r
4.18.0-rocky8_10_rebuild-01aef32f4a9b+

Kselftest crash check

Just checking for crashes, since the last commits is a splat of the exploded Rocky Kernel.

[maple@r8-sigcloud-builder code]$ grep '^ok ' 4.18.0-rocky8_10_rebuild-01aef32f4a9b+.kself.log | wc -l
206

PlaidCat added 30 commits April 8, 2025 17:00
jira LE-2741
Rebuild_History Non-Buildable kernel-4.18.0-553.42.1.el8_10
commit-author Mikulas Patocka <[email protected]>
commit 6e7132e

There was reported lockup when we exit a snapshot with many exceptions.
Fix this by adding "cond_resched" to the loop that frees the exceptions.

	Reported-by: John Pittman <[email protected]>
	Cc: [email protected]
	Signed-off-by: Mikulas Patocka <[email protected]>
	Signed-off-by: Mike Snitzer <[email protected]>
(cherry picked from commit 6e7132e)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-2741
Rebuild_History Non-Buildable kernel-4.18.0-553.42.1.el8_10
commit-author Jason Wang <[email protected]>
commit d71ebe8

Commit a7766ef("virtio_net: disable cb aggressively") enables
virtqueue callback via the following statement:

        do {
		if (use_napi)
			virtqueue_disable_cb(sq->vq);

		free_old_xmit_skbs(sq, false);

	} while (use_napi && kick &&
               unlikely(!virtqueue_enable_cb_delayed(sq->vq)));

When NAPI is used and kick is false, the callback won't be enabled
here. And when the virtqueue is about to be full, the tx will be
disabled, but we still don't enable tx interrupt which will cause a TX
hang. This could be observed when using pktgen with burst enabled.

TO be consistent with the logic that tries to disable cb only for
NAPI, fixing this by trying to enable delayed callback only when NAPI
is enabled when the queue is about to be full.

Fixes: a7766ef ("virtio_net: disable cb aggressively")
	Signed-off-by: Jason Wang <[email protected]>
	Tested-by: Laurent Vivier <[email protected]>
	Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit d71ebe8)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-2741
Rebuild_History Non-Buildable kernel-4.18.0-553.42.1.el8_10
commit-author Anumula Murali Mohan Reddy <[email protected]>
commit c659b40

ip_dev_find() always returns real net_device address, whether traffic is
running on a vlan or real device, if traffic is over vlan, filling
endpoint struture with real ndev and an attempt to send a connect request
will results in RDMA_CM_EVENT_UNREACHABLE error.  This patch fixes the
issue by using vlan_dev_real_dev().

Fixes: 830662f ("RDMA/cxgb4: Add support for active and passive open connection with IPv6 address")
Link: https://patch.msgid.link/r/[email protected]
	Signed-off-by: Anumula Murali Mohan Reddy <[email protected]>
	Signed-off-by: Potnuri Bharat Teja <[email protected]>
	Signed-off-by: Jason Gunthorpe <[email protected]>
(cherry picked from commit c659b40)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-2741
Rebuild_History Non-Buildable kernel-4.18.0-553.42.1.el8_10
commit-author Chen Zhongjin <[email protected]>
commit 672e426

ovl_dentry_revalidate_common() can be called in rcu-walk mode.  As document
said, "in rcu-walk mode, d_parent and d_inode should not be used without
care".

Check inode here to protect access under rcu-walk mode.

Fixes: bccece1 ("ovl: allow remote upper")
Reported-and-tested-by: [email protected]
	Signed-off-by: Chen Zhongjin <[email protected]>
	Cc: <[email protected]> # v5.7
	Signed-off-by: Miklos Szeredi <[email protected]>
(cherry picked from commit 672e426)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-2741
Rebuild_History Non-Buildable kernel-4.18.0-553.42.1.el8_10
commit-author Kai Mäkisara <[email protected]>
commit 98b3788
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.42.1.el8_10/98b37881.failed

Commit 9604eea ("scsi: st: Add third party poweron reset handling") in
v6.6 added new code to handle the Power On/Reset Unit Attention (POR UA)
sense data. This was in addition to the existing method. When this Unit
Attention is received, the driver blocks attempts to read, write and some
other operations because the reset may have rewinded the tape. Because of
the added code, also the initial POR UA resulted in blocking operations,
including those that are used to set the driver options after the device is
recognized. Also, reading and writing are refused, whereas they succeeded
before this commit.

Add code to not set pos_unknown to block operations if the POR UA is
received from the first test_ready() call after the st device has been
created. This restores the behavior before v6.6.

	Signed-off-by: Kai Mäkisara <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Fixes: 9604eea ("scsi: st: Add third party poweron reset handling")
CC: [email protected]
Closes: https://lore.kernel.org/linux-scsi/[email protected]/
	Signed-off-by: Martin K. Petersen <[email protected]>
(cherry picked from commit 98b3788)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	drivers/scsi/st.c
…le_direct_reclaim()

jira LE-2741
Rebuild_History Non-Buildable kernel-4.18.0-553.42.1.el8_10
commit-author Seiji Nishikawa <[email protected]>
commit 6aaced5

The task sometimes continues looping in throttle_direct_reclaim() because
allow_direct_reclaim(pgdat) keeps returning false.

 #0 [ffff80002cb6f8d0] __switch_to at ffff8000080095ac
 #1 [ffff80002cb6f900] __schedule at ffff800008abbd1c
 #2 [ffff80002cb6f990] schedule at ffff800008abc50c
 #3 [ffff80002cb6f9b0] throttle_direct_reclaim at ffff800008273550
 #4 [ffff80002cb6fa20] try_to_free_pages at ffff800008277b68
 #5 [ffff80002cb6fae0] __alloc_pages_nodemask at ffff8000082c4660
 #6 [ffff80002cb6fc50] alloc_pages_vma at ffff8000082e4a98
 #7 [ffff80002cb6fca0] do_anonymous_page at ffff80000829f5a8
 #8 [ffff80002cb6fce0] __handle_mm_fault at ffff8000082a5974
 #9 [ffff80002cb6fd90] handle_mm_fault at ffff8000082a5bd4

At this point, the pgdat contains the following two zones:

        NODE: 4  ZONE: 0  ADDR: ffff00817fffe540  NAME: "DMA32"
          SIZE: 20480  MIN/LOW/HIGH: 11/28/45
          VM_STAT:
                NR_FREE_PAGES: 359
        NR_ZONE_INACTIVE_ANON: 18813
          NR_ZONE_ACTIVE_ANON: 0
        NR_ZONE_INACTIVE_FILE: 50
          NR_ZONE_ACTIVE_FILE: 0
          NR_ZONE_UNEVICTABLE: 0
        NR_ZONE_WRITE_PENDING: 0
                     NR_MLOCK: 0
                    NR_BOUNCE: 0
                   NR_ZSPAGES: 0
            NR_FREE_CMA_PAGES: 0

        NODE: 4  ZONE: 1  ADDR: ffff00817fffec00  NAME: "Normal"
          SIZE: 8454144  PRESENT: 98304  MIN/LOW/HIGH: 68/166/264
          VM_STAT:
                NR_FREE_PAGES: 146
        NR_ZONE_INACTIVE_ANON: 94668
          NR_ZONE_ACTIVE_ANON: 3
        NR_ZONE_INACTIVE_FILE: 735
          NR_ZONE_ACTIVE_FILE: 78
          NR_ZONE_UNEVICTABLE: 0
        NR_ZONE_WRITE_PENDING: 0
                     NR_MLOCK: 0
                    NR_BOUNCE: 0
                   NR_ZSPAGES: 0
            NR_FREE_CMA_PAGES: 0

In allow_direct_reclaim(), while processing ZONE_DMA32, the sum of
inactive/active file-backed pages calculated in zone_reclaimable_pages()
based on the result of zone_page_state_snapshot() is zero.

Additionally, since this system lacks swap, the calculation of inactive/
active anonymous pages is skipped.

        crash> p nr_swap_pages
        nr_swap_pages = $1937 = {
          counter = 0
        }

As a result, ZONE_DMA32 is deemed unreclaimable and skipped, moving on to
the processing of the next zone, ZONE_NORMAL, despite ZONE_DMA32 having
free pages significantly exceeding the high watermark.

The problem is that the pgdat->kswapd_failures hasn't been incremented.

        crash> px ((struct pglist_data *) 0xffff00817fffe540)->kswapd_failures
        $1935 = 0x0

This is because the node deemed balanced.  The node balancing logic in
balance_pgdat() evaluates all zones collectively.  If one or more zones
(e.g., ZONE_DMA32) have enough free pages to meet their watermarks, the
entire node is deemed balanced.  This causes balance_pgdat() to exit early
before incrementing the kswapd_failures, as it considers the overall
memory state acceptable, even though some zones (like ZONE_NORMAL) remain
under significant pressure.

The patch ensures that zone_reclaimable_pages() includes free pages
(NR_FREE_PAGES) in its calculation when no other reclaimable pages are
available (e.g., file-backed or anonymous pages).  This change prevents
zones like ZONE_DMA32, which have sufficient free pages, from being
mistakenly deemed unreclaimable.  By doing so, the patch ensures proper
node balancing, avoids masking pressure on other zones like ZONE_NORMAL,
and prevents infinite loops in throttle_direct_reclaim() caused by
allow_direct_reclaim(pgdat) repeatedly returning false.

The kernel hangs due to a task stuck in throttle_direct_reclaim(), caused
by a node being incorrectly deemed balanced despite pressure in certain
zones, such as ZONE_NORMAL.  This issue arises from
zone_reclaimable_pages() returning 0 for zones without reclaimable file-
backed or anonymous pages, causing zones like ZONE_DMA32 with sufficient
free pages to be skipped.

The lack of swap or reclaimable pages results in ZONE_DMA32 being ignored
during reclaim, masking pressure in other zones.  Consequently,
pgdat->kswapd_failures remains 0 in balance_pgdat(), preventing fallback
mechanisms in allow_direct_reclaim() from being triggered, leading to an
infinite loop in throttle_direct_reclaim().

This patch modifies zone_reclaimable_pages() to account for free pages
(NR_FREE_PAGES) when no other reclaimable pages exist.  This ensures zones
with sufficient free pages are not skipped, enabling proper balancing and
reclaim behavior.

[[email protected]: coding-style cleanups]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 5a1c84b ("mm: remove reclaim and compaction retry approximations")
	Signed-off-by: Seiji Nishikawa <[email protected]>
	Cc: Mel Gorman <[email protected]>
	Cc: <[email protected]>
	Signed-off-by: Andrew Morton <[email protected]>
(cherry picked from commit 6aaced5)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-2741
Rebuild_History Non-Buildable kernel-4.18.0-553.42.1.el8_10
commit-author Chuck Lever <[email protected]>
commit 961b4b5

I noticed that once an NFSv4.1 callback operation gets a
NFS4ERR_DELAY status on CB_SEQUENCE and then the connection is lost,
the callback client loops, resending it indefinitely.

The switch arm in nfsd4_cb_sequence_done() that handles
NFS4ERR_DELAY uses rpc_restart_call() to rearm the RPC state machine
for the retransmit, but that path does not call the rpc_prepare_call
callback again. Thus cb_seq_status is set to -10008 by the first
NFS4ERR_DELAY result, but is never set back to 1 for the retransmits.

nfsd4_cb_sequence_done() thinks it's getting nothing but a
long series of CB_SEQUENCE NFS4ERR_DELAY replies.

Fixes: 7ba6cad ("nfsd: New helper nfsd4_cb_sequence_done() for processing more cb errors")
	Reviewed-by: Jeff Layton <[email protected]>
	Reviewed-by: Benjamin Coddington <[email protected]>
	Signed-off-by: Chuck Lever <[email protected]>
(cherry picked from commit 961b4b5)
	Signed-off-by: Jonathan Maple <[email protected]>
Rebuild_History BUILDABLE
Rebuilding Kernel from rpm changelog with Fuzz Limit: 87.50%
Number of commits in upstream range v4.18~1..master: 538207
Number of commits in rpm: 14
Number of commits matched with upstream: 7 (50.00%)
Number of commits in upstream but not in rpm: 538200
Number of commits NOT found in upstream: 7 (50.00%)

Rebuilding Kernel on Branch rocky8_10_rebuild_kernel-4.18.0-553.42.1.el8_10 for kernel-4.18.0-553.42.1.el8_10
Clean Cherry Picks: 6 (85.71%)
Empty Cherry Picks: 1 (14.29%)
_______________________________

Full Details Located here:
ciq/ciq_backports/kernel-4.18.0-553.42.1.el8_10/rebuild.details.txt

Includes:
* git commit header above
* Empty Commits with upstream SHA
* RPM ChangeLog Entries that could not be matched

Individual Empty Commit failures contained in the same containing directory.
The git message for empty commits will have the path for the failed commit.
File names are the first 8 characters of the upstream SHA
jira LE-2741
Rebuild_History Non-Buildable kernel-4.18.0-553.44.1.el8_10
commit-author Jiri Pirko <[email protected]>
commit 72ed5d5
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.44.1.el8_10/72ed5d56.failed

The original behavior introduced by commit c6acd62 ("net/mlx5e: Add
support for devlink-port in non-representors mode") correctly
re-instantiated uplink devlink port and related netdevice during devlink
reload. However with migration to auxiliary devices, this behaviour
changed.

Restore the original behaviour and tear down auxiliary devices
completely during devlink reload.

	Signed-off-by: Jiri Pirko <[email protected]>
	Signed-off-by: Saeed Mahameed <[email protected]>
(cherry picked from commit 72ed5d5)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
#	drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
jira LE-2741
Rebuild_History Non-Buildable kernel-4.18.0-553.44.1.el8_10
commit-author Moshe Shemesh <[email protected]>
commit aab8e1a
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.44.1.el8_10/aab8e1a2.failed

Handling pci errors should fully teardown and load back auxiliary
devices, same as done through mlx5 health recovery flow.

Fixes: 72ed5d5 ("net/mlx5: Suspend auxiliary devices only in case of PCI device suspend")
	Signed-off-by: Moshe Shemesh <[email protected]>
	Signed-off-by: Saeed Mahameed <[email protected]>
(cherry picked from commit aab8e1a)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	drivers/net/ethernet/mellanox/mlx5/core/main.c
jira LE-2741
cve CVE-2024-57807
Rebuild_History Non-Buildable kernel-4.18.0-553.44.1.el8_10
commit-author Tomas Henzl <[email protected]>
commit 50740f4

This fixes a 'possible circular locking dependency detected' warning
      CPU0                    CPU1
      ----                    ----
 lock(&instance->reset_mutex);
                              lock(&shost->scan_mutex);
                              lock(&instance->reset_mutex);
 lock(&shost->scan_mutex);

Fix this by temporarily releasing the reset_mutex.

	Signed-off-by: Tomas Henzl <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
	Acked-by: Chandrakanth Patil <[email protected]>
	Signed-off-by: Martin K. Petersen <[email protected]>
(cherry picked from commit 50740f4)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-2741
Rebuild_History Non-Buildable kernel-4.18.0-553.44.1.el8_10
commit-author Nina Schoetterl-Glausch <[email protected]>
commit 22fdd8b

In order for SIE to interpretively execute STFLE, it requires the real
or absolute address of a facility-list control block.
Before writing the location into the shadow SIE control block, convert
it from a virtual address.
We currently do not run into this bug because the lower 31 bits are the
same for virtual and physical addresses.

	Signed-off-by: Nina Schoetterl-Glausch <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
	Signed-off-by: Janosch Frank <[email protected]>
Message-Id: <[email protected]>
	Signed-off-by: Alexander Gordeev <[email protected]>
(cherry picked from commit 22fdd8b)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-2741
Rebuild_History Non-Buildable kernel-4.18.0-553.44.1.el8_10
commit-author Nina Schoetterl-Glausch <[email protected]>
commit cc4edb9

The address of the crypto control block in the (shadow) SIE block is
absolute/physical.
Convert from virtual to physical when shadowing the guest's control
block during VSIE.

	Signed-off-by: Nina Schoetterl-Glausch <[email protected]>
	Reviewed-by: Christian Borntraeger <[email protected]>
	Acked-by: Alexander Gordeev <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
	Signed-off-by: Alexander Gordeev <[email protected]>
(cherry picked from commit cc4edb9)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-2741
Rebuild_History Non-Buildable kernel-4.18.0-553.44.1.el8_10
commit-author Claudio Imbrenda <[email protected]>
commit cff59d8

The return value uv_set_shared() and uv_remove_shared() (which are
wrappers around the share() function) is not always checked. The system
integrity of a protected guest depends on the Share and Unshare UVCs
being successful. This means that any caller that fails to check the
return value will compromise the security of the protected guest.

No code path that would lead to such violation of the security
guarantees is currently exercised, since all the areas that are shared
never get unshared during the lifetime of the system. This might
change and become an issue in the future.

The Share and Unshare UVCs can only fail in case of hypervisor
misbehaviour (either a bug or malicious behaviour). In such cases there
is no reasonable way forward, and the system needs to panic.

This patch replaces the return at the end of the share() function with
a panic, to guarantee system integrity.

Fixes: 5abb935 ("s390/uv: introduce guest side ultravisor code")
	Signed-off-by: Claudio Imbrenda <[email protected]>
	Reviewed-by: Christian Borntraeger <[email protected]>
	Reviewed-by: Steffen Eiden <[email protected]>
	Reviewed-by: Janosch Frank <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Message-ID: <[email protected]>
[[email protected]: Fixed up patch subject]
	Signed-off-by: Janosch Frank <[email protected]>
(cherry picked from commit cff59d8)
	Signed-off-by: Jonathan Maple <[email protected]>
…query

jira LE-2741
Rebuild_History Non-Buildable kernel-4.18.0-553.44.1.el8_10
commit-author Hariharan Mari <[email protected]>
commit 09c38ad

The __insn32_query() function incorrectly uses the RRF instruction format
for both the SORTL (RRE format) and DFLTCC (RRF format) instructions.
To fix this issue, add separate query functions for SORTL and DFLTCC that
use the appropriate instruction formats.

Additionally pass the query operand as a pointer to the entire array
of 32 elements to slightly optimize performance and readability.

Fixes: d668139 ("KVM: s390: provide query function for instructions returning 32 byte")
	Suggested-by: Heiko Carstens <[email protected]>
	Reviewed-by: Juergen Christ <[email protected]>
	Signed-off-by: Hariharan Mari <[email protected]>
	Signed-off-by: Janosch Frank <[email protected]>
(cherry picked from commit 09c38ad)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-2741
Rebuild_History Non-Buildable kernel-4.18.0-553.44.1.el8_10
commit-author Nico Boehr <[email protected]>
commit e8061f0

Previously, access_guest_page() did not check whether the given guest
address is inside of a memslot. This is not a problem, since
kvm_write_guest_page/kvm_read_guest_page return -EFAULT in this case.

However, -EFAULT is also returned when copy_to/from_user fails.

When emulating a guest instruction, the address being outside a memslot
usually means that an addressing exception should be injected into the
guest.

Failure in copy_to/from_user however indicates that something is wrong
in userspace and hence should be handled there.

To be able to distinguish these two cases, return PGM_ADDRESSING in
access_guest_page() when the guest address is outside guest memory. In
access_guest_real(), populate vcpu->arch.pgm.code such that
kvm_s390_inject_prog_cond() can be used in the caller for injecting into
the guest (if applicable).

Since this adds a new return value to access_guest_page(), we need to make
sure that other callers are not confused by the new positive return value.

There are the following users of access_guest_page():
- access_guest_with_key() does the checking itself (in
  guest_range_to_gpas()), so this case should never happen. Even if, the
  handling is set up properly.
- access_guest_real() just passes the return code to its callers, which
  are:
    - read_guest_real() - see below
    - write_guest_real() - see below

There are the following users of read_guest_real():
- ar_translation() in gaccess.c which already returns PGM_*
- setup_apcb10(), setup_apcb00(), setup_apcb11() in vsie.c which always
  return -EFAULT on read_guest_read() nonzero return - no change
- shadow_crycb(), handle_stfle() always present this as validity, this
  could be handled better but doesn't change current behaviour - no change

There are the following users of write_guest_real():
- kvm_s390_store_status_unloaded() always returns -EFAULT on
  write_guest_real() failure.

Fixes: 2293897 ("KVM: s390: add architecture compliant guest access functions")
	Cc: [email protected]
	Signed-off-by: Nico Boehr <[email protected]>
	Reviewed-by: Heiko Carstens <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
	Acked-by: Janosch Frank <[email protected]>
	Signed-off-by: Heiko Carstens <[email protected]>
(cherry picked from commit e8061f0)
	Signed-off-by: Jonathan Maple <[email protected]>
…ndler

jira LE-2741
Rebuild_History Non-Buildable kernel-4.18.0-553.44.1.el8_10
commit-author Michael Mueller <[email protected]>
commit cad4b3d

The parameters for the diag 0x258 are real addresses, not virtual, but
KVM was using them as virtual addresses. This only happened to work, since
the Linux kernel as a guest used to have a 1:1 mapping for physical vs
virtual addresses.

Fix KVM so that it correctly uses the addresses as real addresses.

	Cc: [email protected]
Fixes: 8ae04b8 ("KVM: s390: Guest's memory access functions get access registers")
	Suggested-by: Vasily Gorbik <[email protected]>
	Signed-off-by: Michael Mueller <[email protected]>
	Signed-off-by: Nico Boehr <[email protected]>
	Reviewed-by: Christian Borntraeger <[email protected]>
	Reviewed-by: Heiko Carstens <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
	Acked-by: Janosch Frank <[email protected]>
	Signed-off-by: Heiko Carstens <[email protected]>
(cherry picked from commit cad4b3d)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-2741
Rebuild_History Non-Buildable kernel-4.18.0-553.44.1.el8_10
commit-author Calvin Owens <[email protected]>
commit c79a39d
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.44.1.el8_10/c79a39dc.failed

On a board running ntpd and gpsd, I'm seeing a consistent use-after-free
in sys_exit() from gpsd when rebooting:

    pps pps1: removed
    ------------[ cut here ]------------
    kobject: '(null)' (00000000db4bec24): is not initialized, yet kobject_put() is being called.
    WARNING: CPU: 2 PID: 440 at lib/kobject.c:734 kobject_put+0x120/0x150
    CPU: 2 UID: 299 PID: 440 Comm: gpsd Not tainted 6.11.0-rc6-00308-gb31c44928842 #1
    Hardware name: Raspberry Pi 4 Model B Rev 1.1 (DT)
    pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
    pc : kobject_put+0x120/0x150
    lr : kobject_put+0x120/0x150
    sp : ffffffc0803d3ae0
    x29: ffffffc0803d3ae0 x28: ffffff8042dc9738 x27: 0000000000000001
    x26: 0000000000000000 x25: ffffff8042dc9040 x24: ffffff8042dc9440
    x23: ffffff80402a4620 x22: ffffff8042ef4bd0 x21: ffffff80405cb600
    x20: 000000000008001b x19: ffffff8040b3b6e0 x18: 0000000000000000
    x17: 0000000000000000 x16: 0000000000000000 x15: 696e6920746f6e20
    x14: 7369203a29343263 x13: 205d303434542020 x12: 0000000000000000
    x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
    x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
    x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
    x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
    Call trace:
     kobject_put+0x120/0x150
     cdev_put+0x20/0x3c
     __fput+0x2c4/0x2d8
     ____fput+0x1c/0x38
     task_work_run+0x70/0xfc
     do_exit+0x2a0/0x924
     do_group_exit+0x34/0x90
     get_signal+0x7fc/0x8c0
     do_signal+0x128/0x13b4
     do_notify_resume+0xdc/0x160
     el0_svc+0xd4/0xf8
     el0t_64_sync_handler+0x140/0x14c
     el0t_64_sync+0x190/0x194
    ---[ end trace 0000000000000000 ]---

...followed by more symptoms of corruption, with similar stacks:

    refcount_t: underflow; use-after-free.
    kernel BUG at lib/list_debug.c:62!
    Kernel panic - not syncing: Oops - BUG: Fatal exception

This happens because pps_device_destruct() frees the pps_device with the
embedded cdev immediately after calling cdev_del(), but, as the comment
above cdev_del() notes, fops for previously opened cdevs are still
callable even after cdev_del() returns. I think this bug has always
been there: I can't explain why it suddenly started happening every time
I reboot this particular board.

In commit d953e0e ("pps: Fix a use-after free bug when
unregistering a source."), George Spelvin suggested removing the
embedded cdev. That seems like the simplest way to fix this, so I've
implemented his suggestion, using __register_chrdev() with pps_idr
becoming the source of truth for which minor corresponds to which
device.

But now that pps_idr defines userspace visibility instead of cdev_add(),
we need to be sure the pps->dev refcount can't reach zero while
userspace can still find it again. So, the idr_remove() call moves to
pps_unregister_cdev(), and pps_idr now holds a reference to pps->dev.

    pps_core: source serial1 got cdev (251:1)
    <...>
    pps pps1: removed
    pps_core: unregistering pps1
    pps_core: deallocating pps1

Fixes: d953e0e ("pps: Fix a use-after free bug when unregistering a source.")
	Cc: [email protected]
	Signed-off-by: Calvin Owens <[email protected]>
	Reviewed-by: Michal Schmidt <[email protected]>
Link: https://lore.kernel.org/r/a17975fd5ae99385791929e563f72564edbcf28f.1731383727.git.calvin@wbinvd.org
	Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit c79a39d)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	drivers/pps/clients/pps-gpio.c
#	drivers/pps/clients/pps-ldisc.c
#	drivers/pps/pps.c
#	drivers/ptp/ptp_ocp.c
jira LE-2741
Rebuild_History Non-Buildable kernel-4.18.0-553.44.1.el8_10
commit-author Hector Martin <[email protected]>
commit 415d832
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.44.1.el8_10/415d8324.failed

These operations are documented as always ordered in
include/asm-generic/bitops/instrumented-atomic.h, and producer-consumer
type use cases where one side needs to ensure a flag is left pending
after some shared data was updated rely on this ordering, even in the
failure case.

This is the case with the workqueue code, which currently suffers from a
reproducible ordering violation on Apple M1 platforms (which are
notoriously out-of-order) that ends up causing the TTY layer to fail to
deliver data to userspace properly under the right conditions.  This
change fixes that bug.

Change the documentation to restrict the "no order on failure" story to
the _lock() variant (for which it makes sense), and remove the
early-exit from the generic implementation, which is what causes the
missing barrier semantics in that case.  Without this, the remaining
atomic op is fully ordered (including on ARM64 LSE, as of recent
versions of the architecture spec).

	Suggested-by: Linus Torvalds <[email protected]>
	Cc: [email protected]
Fixes: e986a0d ("locking/atomics, asm-generic/bitops/atomic.h: Rewrite using atomic_*() APIs")
Fixes: 61e0239 ("locking/atomic/bitops: Document and clarify ordering semantics for failed test_and_{}_bit()")
	Signed-off-by: Hector Martin <[email protected]>
	Acked-by: Will Deacon <[email protected]>
	Reviewed-by: Arnd Bergmann <[email protected]>
	Signed-off-by: Linus Torvalds <[email protected]>
(cherry picked from commit 415d832)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	include/asm-generic/bitops/atomic.h
jira LE-2741
Rebuild_History Non-Buildable kernel-4.18.0-553.44.1.el8_10
commit-author Peter Zijlstra <[email protected]>
commit be24226

Because of late module patching, a livepatch module needs to be able to
apply some of its relocations well after it has been loaded.  Instead of
playing games with module_{dis,en}able_ro(), use existing text poking
mechanisms to apply relocations after module loading.

So far only x86, s390 and Power have HAVE_LIVEPATCH but only the first
two also have STRICT_MODULE_RWX.

This will allow removal of the last module_disable_ro() usage in
livepatch.  The ultimate goal is to completely disallow making
executable mappings writable.

[ jpoimboe: Split up patches.  Use mod state to determine whether
	    memcpy() can be used.  Test and add fixes. ]

	Cc: [email protected]
	Cc: Heiko Carstens <[email protected]>
	Cc: Gerald Schaefer <[email protected]>
	Cc: Christian Borntraeger <[email protected]>
	Suggested-by: Josh Poimboeuf <[email protected]>
	Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
	Signed-off-by: Josh Poimboeuf <[email protected]>
	Acked-by: Peter Zijlstra (Intel) <[email protected]>
	Acked-by: Joe Lawrence <[email protected]>
	Acked-by: Miroslav Benes <[email protected]>
	Acked-by: Gerald Schaefer <[email protected]> # s390
	Signed-off-by: Jiri Kosina <[email protected]>
(cherry picked from commit be24226)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-2741
Rebuild_History Non-Buildable kernel-4.18.0-553.44.1.el8_10
commit-author Ilya Leoshkevich <[email protected]>
commit f3b7e73

If the size of the PLT entries generated by apply_rela() exceeds
64KiB, the first ones can no longer reach __jump_r1 with brc. Fix by
using brcl. An alternative solution is to add a __jump_r1 copy after
every 64KiB, however, the space savings are quite small and do not
justify the additional complexity.

Fixes: f19fbd5 ("s390: introduce execute-trampolines for branches")
	Cc: [email protected]
	Reported-by: Andrea Righi <[email protected]>
	Signed-off-by: Ilya Leoshkevich <[email protected]>
	Reviewed-by: Heiko Carstens <[email protected]>
	Cc: Vasily Gorbik <[email protected]>
	Cc: Christian Borntraeger <[email protected]>
	Signed-off-by: Heiko Carstens <[email protected]>
(cherry picked from commit f3b7e73)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-2741
Rebuild_History Non-Buildable kernel-4.18.0-553.44.1.el8_10
commit-author Manuel Barrio Linares <[email protected]>
commit 44f69dd

This adds support for all sample rates supported by the
hardware,Digidesign Mbox 3 supports: {44100, 48000, 88200, 96000}

Fixes syncing clock issues that presented as pops. To test this, without
this patch playing 440hz tone produces pops.

Clock is now synced between playback and capture interfaces so no more
latency drift issue when using pipewire pro-profile.
(https://gitlab.freedesktop.org/pipewire/pipewire/-/issues/3900)

	Signed-off-by: Manuel Barrio Linares <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
	Signed-off-by: Takashi Iwai <[email protected]>
(cherry picked from commit 44f69dd)
	Signed-off-by: Jonathan Maple <[email protected]>
…box devices

jira LE-2741
cve CVE-2024-53197
Rebuild_History Non-Buildable kernel-4.18.0-553.44.1.el8_10
commit-author Benoît Sevens <[email protected]>
commit b909df1

A bogus device can provide a bNumConfigurations value that exceeds the
initial value used in usb_get_configuration for allocating dev->config.

This can lead to out-of-bounds accesses later, e.g. in
usb_destroy_configuration.

	Signed-off-by: Benoît Sevens <[email protected]>
Fixes: 1da177e ("Linux-2.6.12-rc2")
	Cc: [email protected]
Link: https://patch.msgid.link/[email protected]
	Signed-off-by: Takashi Iwai <[email protected]>
(cherry picked from commit b909df1)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-2741
Rebuild_History Non-Buildable kernel-4.18.0-553.44.1.el8_10
commit-author Manuel Barrio Linares <[email protected]>
commit 5005ccd

Fixed wrong use of usb_sndctrlpipe to usb_rcvctrlpipe

Fixes: 44f69dd ("ALSA: usb-audio: Add sampling rates support for Mbox3")
	Signed-off-by: Manuel Barrio Linares <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
	Signed-off-by: Takashi Iwai <[email protected]>
(cherry picked from commit 5005ccd)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-2741
Rebuild_History Non-Buildable kernel-4.18.0-553.44.1.el8_10
commit-author Dan Carpenter <[email protected]>
commit f7d306b

The usb_get_descriptor() function does DMA so we're not allowed
to use a stack buffer for that.  Doing DMA to the stack is not portable
all architectures.  Move the "new_device_descriptor" from being stored
on the stack and allocate it with kmalloc() instead.

Fixes: b909df1 ("ALSA: usb-audio: Fix potential out-of-bound accesses for Extigy and Mbox devices")
	Cc: [email protected]
	Signed-off-by: Dan Carpenter <[email protected]>
Link: https://patch.msgid.link/[email protected]
	Signed-off-by: Takashi Iwai <[email protected]>
(cherry picked from commit f7d306b)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-2741
cve CVE-2024-50302
Rebuild_History Non-Buildable kernel-4.18.0-553.44.1.el8_10
commit-author Jiri Kosina <[email protected]>
commit 177f25d

Since the report buffer is used by all kinds of drivers in various ways, let's
zero-initialize it during allocation to make sure that it can't be ever used
to leak kernel memory via specially-crafted report.

Fixes: 27ce405 ("HID: fix data access in implement()")
	Reported-by: Benoît Sevens <[email protected]>
	Acked-by: Benjamin Tissoires <[email protected]>
	Signed-off-by: Jiri Kosina <[email protected]>
(cherry picked from commit 177f25d)
	Signed-off-by: Jonathan Maple <[email protected]>
Rebuild_History BUILDABLE
Rebuilding Kernel from rpm changelog with Fuzz Limit: 87.50%
Number of commits in upstream range v4.18~1..master: 538207
Number of commits in rpm: 24
Number of commits matched with upstream: 18 (75.00%)
Number of commits in upstream but not in rpm: 538189
Number of commits NOT found in upstream: 6 (25.00%)

Rebuilding Kernel on Branch rocky8_10_rebuild_kernel-4.18.0-553.44.1.el8_10 for kernel-4.18.0-553.44.1.el8_10
Clean Cherry Picks: 14 (77.78%)
Empty Cherry Picks: 4 (22.22%)
_______________________________

Full Details Located here:
ciq/ciq_backports/kernel-4.18.0-553.44.1.el8_10/rebuild.details.txt

Includes:
* git commit header above
* Empty Commits with upstream SHA
* RPM ChangeLog Entries that could not be matched

Individual Empty Commit failures contained in the same containing directory.
The git message for empty commits will have the path for the failed commit.
File names are the first 8 characters of the upstream SHA
jira LE-2741
Rebuild_History Non-Buildable kernel-4.18.0-553.45.1.el8_10
commit-author Dmitry Antipov <[email protected]>
commit 6cf9ff4
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.45.1.el8_10/6cf9ff46.failed

Commit 67f562e ("net/smc: transfer fasync_list in case of fallback")
leaves the socket's fasync list pointer within a container socket as well.
When the latter is destroyed, '__sock_release()' warns about its non-empty
fasync list, which is a dangling pointer to previously freed fasync list
of an underlying TCP socket. Fix this spurious warning by nullifying
fasync list of a container socket.

Fixes: 67f562e ("net/smc: transfer fasync_list in case of fallback")
	Signed-off-by: Dmitry Antipov <[email protected]>
	Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit 6cf9ff4)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	net/smc/af_smc.c
jira LE-2741
Rebuild_History Non-Buildable kernel-4.18.0-553.45.1.el8_10
commit-author Guangguan Wang <[email protected]>
commit c12b270

AF_INET6 is not supported for smc-r v2 client before, even if the
ipv6 addr is ipv4 mapped. Thus, when using AF_INET6, smc-r connection
will fallback to tcp, especially for java applications running smc-r.
This patch support ipv4 mapped ipv6 addr client for smc-r v2. Clients
using real global ipv6 addr is still not supported yet.

	Signed-off-by: Guangguan Wang <[email protected]>
	Reviewed-by: Wen Gu <[email protected]>
	Reviewed-by: Dust Li <[email protected]>
	Reviewed-by: D. Wythe <[email protected]>
	Reviewed-by: Wenjia Zhang <[email protected]>
	Reviewed-by: Halil Pasic <[email protected]>
	Signed-off-by: Paolo Abeni <[email protected]>

(cherry picked from commit c12b270)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-2741
Rebuild_History Non-Buildable kernel-4.18.0-553.45.1.el8_10
commit-author Andreas Gruenbacher <[email protected]>
commit 5788253

Add a number of glock flags are currently not shown in the text form of
glock tracepoints.

	Signed-off-by: Andreas Gruenbacher <[email protected]>
(cherry picked from commit 5788253)
	Signed-off-by: Jonathan Maple <[email protected]>
PlaidCat added 17 commits April 8, 2025 17:02
jira LE-2741
Rebuild_History Non-Buildable kernel-4.18.0-553.45.1.el8_10
commit-author Bailey Forrest <[email protected]>
commit 36e3b94

The NIC requires each TSO segment to not span more than 10
descriptors. NIC further requires each descriptor to not exceed
16KB - 1 (GVE_TX_MAX_BUF_SIZE_DQO).

The descriptors for an skb are generated by
gve_tx_add_skb_no_copy_dqo() for DQO RDA queue format.
gve_tx_add_skb_no_copy_dqo() loops through each skb frag and
generates a descriptor for the entire frag if the frag size is
not greater than GVE_TX_MAX_BUF_SIZE_DQO. If the frag size is
greater than GVE_TX_MAX_BUF_SIZE_DQO, it is split into descriptor(s)
of size GVE_TX_MAX_BUF_SIZE_DQO and a descriptor is generated for
the remainder (frag size % GVE_TX_MAX_BUF_SIZE_DQO).

gve_can_send_tso() checks if the descriptors thus generated for an
skb would meet the requirement that each TSO-segment not span more
than 10 descriptors. However, the current code misses an edge case
when a TSO segment spans multiple descriptors within a large frag.
This change fixes the edge case.

gve_can_send_tso() relies on the assumption that max gso size (9728)
is less than GVE_TX_MAX_BUF_SIZE_DQO and therefore within an skb
fragment a TSO segment can never span more than 2 descriptors.

Fixes: a57e5de ("gve: DQO: Add TX path")
	Signed-off-by: Praveen Kaligineedi <[email protected]>
	Signed-off-by: Bailey Forrest <[email protected]>
	Reviewed-by: Jeroen de Borst <[email protected]>
	Cc: [email protected]
	Reviewed-by: Willem de Bruijn <[email protected]>
Link: https://patch.msgid.link/[email protected]
	Signed-off-by: Jakub Kicinski <[email protected]>
(cherry picked from commit 36e3b94)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-2741
Rebuild_History Non-Buildable kernel-4.18.0-553.45.1.el8_10
commit-author Joshua Washington <[email protected]>
commit ff7c2de

In GVE, dedicated XDP queues only exist when an XDP program is installed
and the interface is up. As such, the NDO XDP XMIT callback should
return early if either of these conditions are false.

In the case of no loaded XDP program, priv->num_xdp_queues=0 which can
cause a divide-by-zero error, and in the case of interface down,
num_xdp_queues remains untouched to persist XDP queue count for the next
interface up, but the TX pointer itself would be NULL.

The XDP xmit callback also needs to synchronize with a device
transitioning from open to close. This synchronization will happen via
the GVE_PRIV_FLAGS_NAPI_ENABLED bit along with a synchronize_net() call,
which waits for any RCU critical sections at call-time to complete.

Fixes: 39a7f4a ("gve: Add XDP REDIRECT support for GQI-QPL format")
	Cc: [email protected]
	Signed-off-by: Joshua Washington <[email protected]>
	Signed-off-by: Praveen Kaligineedi <[email protected]>
	Reviewed-by: Praveen Kaligineedi <[email protected]>
	Reviewed-by: Shailend Chand <[email protected]>
	Reviewed-by: Willem de Bruijn <[email protected]>
	Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit ff7c2de)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-2741
Rebuild_History Non-Buildable kernel-4.18.0-553.45.1.el8_10
commit-author Joshua Washington <[email protected]>
commit 40338d7

This patch predicates the enabling and disabling of XSK pools on the
existence of queues. As it stands, if the interface is down, disabling
or enabling XSK pools would result in a crash, as the RX queue pointer
would be NULL. XSK pool registration will occur as part of the next
interface up.

Similarly, xsk_wakeup needs be guarded against queues disappearing
while the function is executing, so a check against the
GVE_PRIV_FLAGS_NAPI_ENABLED flag is added to synchronize with the
disabling of the bit and the synchronize_net() in gve_turndown.

Fixes: fd8e403 ("gve: Add AF_XDP zero-copy support for GQI-QPL format")
	Cc: [email protected]
	Signed-off-by: Joshua Washington <[email protected]>
	Signed-off-by: Praveen Kaligineedi <[email protected]>
	Reviewed-by: Praveen Kaligineedi <[email protected]>
	Reviewed-by: Shailend Chand <[email protected]>
	Reviewed-by: Willem de Bruijn <[email protected]>
	Reviewed-by: Larysa Zaremba <[email protected]>
	Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit 40338d7)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-2741
Rebuild_History Non-Buildable kernel-4.18.0-553.45.1.el8_10
commit-author Joshua Washington <[email protected]>
commit ba0925c
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.45.1.el8_10/ba0925c3.failed

When busy polling is enabled, xsk_sendmsg for AF_XDP zero copy marks
the NAPI ID corresponding to the memory pool allocated for the socket.
In GVE, this NAPI ID will never correspond to a NAPI ID of one of the
dedicated XDP TX queues registered with the umem because XDP TX is not
set up to share a NAPI with a corresponding RX queue.

This patch moves XSK TX descriptor processing from the TX NAPI to the RX
NAPI, and the gve_xsk_wakeup callback is updated to use the RX NAPI
instead of the TX NAPI, accordingly. The branch on if the wakeup is for
TX is removed, as the NAPI poll should be invoked whether the wakeup is
for TX or for RX.

Fixes: fd8e403 ("gve: Add AF_XDP zero-copy support for GQI-QPL format")
	Cc: [email protected]
	Signed-off-by: Praveen Kaligineedi <[email protected]>
	Signed-off-by: Joshua Washington <[email protected]>
	Reviewed-by: Willem de Bruijn <[email protected]>
	Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit ba0925c)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	drivers/net/ethernet/google/gve/gve.h
jira LE-2741
Rebuild_History Non-Buildable kernel-4.18.0-553.45.1.el8_10
commit-author Joshua Washington <[email protected]>
commit fb3a9a1

Commit ba0925c ("gve: process XSK TX descriptors as part of RX NAPI")
moved XSK TX processing to be part of the RX NAPI. However, that commit
did not include triggering the RX NAPI in gve_xsk_wakeup. This is
necessary because the TX NAPI only processes TX completions, meaning
that a TX wakeup would not actually trigger XSK descriptor processing.
Also, the branch on XDP_WAKEUP_TX was supposed to have been removed, as
the NAPI should be scheduled whether the wakeup is for RX or TX.

Fixes: ba0925c ("gve: process XSK TX descriptors as part of RX NAPI")
	Cc: [email protected]
	Signed-off-by: Joshua Washington <[email protected]>
	Signed-off-by: Praveen Kaligineedi <[email protected]>
Link: https://patch.msgid.link/[email protected]
	Signed-off-by: Jakub Kicinski <[email protected]>
(cherry picked from commit fb3a9a1)
	Signed-off-by: Jonathan Maple <[email protected]>
Rebuild_History BUILDABLE
Rebuilding Kernel from rpm changelog with Fuzz Limit: 87.50%
Number of commits in upstream range v4.18~1..master: 538207
Number of commits in rpm: 19
Number of commits matched with upstream: 13 (68.42%)
Number of commits in upstream but not in rpm: 538194
Number of commits NOT found in upstream: 6 (31.58%)

Rebuilding Kernel on Branch rocky8_10_rebuild_kernel-4.18.0-553.45.1.el8_10 for kernel-4.18.0-553.45.1.el8_10
Clean Cherry Picks: 11 (84.62%)
Empty Cherry Picks: 2 (15.38%)
_______________________________

Full Details Located here:
ciq/ciq_backports/kernel-4.18.0-553.45.1.el8_10/rebuild.details.txt

Includes:
* git commit header above
* Empty Commits with upstream SHA
* RPM ChangeLog Entries that could not be matched

Individual Empty Commit failures contained in the same containing directory.
The git message for empty commits will have the path for the failed commit.
File names are the first 8 characters of the upstream SHA
jira LE-2741
Rebuild_History Non-Buildable kernel-4.18.0-553.46.1.el8_10
commit-author Dave Airlie <[email protected]>
commit 1f9910b

The fence sync logic doesn't handle a fence sync across devices
as it tries to write to a channel offset from one device into
the fence bo from a different device, which won't work so well.

This patch fixes that to avoid using the sync path in the case
where the fences come from different nouveau drm devices.

This works fine on a single device as the fence bo is shared
across the devices, and mapped into each channels vma space,
the channel offsets are therefore okay to pass between sides,
so one channel can sync on the seqnos from the other by using
the offset into it's vma.

	Signed-off-by: Dave Airlie <[email protected]>
	Cc: [email protected]
	Reviewed-by: Ben Skeggs <[email protected]>
[ Fix compilation issue; remove version log from commit messsage.
  - Danilo ]
	Signed-off-by: Danilo Krummrich <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 1f9910b)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-2741
cve CVE-2025-21785
Rebuild_History Non-Buildable kernel-4.18.0-553.46.1.el8_10
commit-author Radu Rendec <[email protected]>
commit 875d742

The loop that detects/populates cache information already has a bounds
check on the array size but does not account for cache levels with
separate data/instructions cache. Fix this by incrementing the index
for any populated leaf (instead of any populated level).

Fixes: 5d425c1 ("arm64: kernel: add support for cpu cache information")

	Signed-off-by: Radu Rendec <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
	Signed-off-by: Will Deacon <[email protected]>
(cherry picked from commit 875d742)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-2741
Rebuild_History Non-Buildable kernel-4.18.0-553.46.1.el8_10
commit-author Kirill A. Shutemov <[email protected]>
commit 1b8b1aa

Yingcong has noticed that on the 5-level paging machine, VDSO and VVAR
VMAs are placed above the 47-bit border:

8000001a9000-8000001ad000 r--p 00000000 00:00 0                          [vvar]
8000001ad000-8000001af000 r-xp 00000000 00:00 0                          [vdso]

This might confuse users who are not aware of 5-level paging and expect
all userspace addresses to be under the 47-bit border.

So far problem has only been triggered with ASLR disabled, although it
may also occur with ASLR enabled if the layout is randomized in a just
right way.

The problem happens due to custom placement for the VMAs in the VDSO
code: vdso_addr() tries to place them above the stack and checks the
result against TASK_SIZE_MAX, which is wrong. TASK_SIZE_MAX is set to
the 56-bit border on 5-level paging machines. Use DEFAULT_MAP_WINDOW
instead.

Fixes: b569bab ("x86/mm: Prepare to expose larger address space to userspace")
	Reported-by: Yingcong Wu <[email protected]>
	Signed-off-by: Kirill A. Shutemov <[email protected]>
	Signed-off-by: Dave Hansen <[email protected]>
	Cc: [email protected]
Link: https://lore.kernel.org/all/20230803151609.22141-1-kirill.shutemov%40linux.intel.com
(cherry picked from commit 1b8b1aa)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-2741
Rebuild_History Non-Buildable kernel-4.18.0-553.46.1.el8_10
commit-author Niklas Schnelle <[email protected]>
commit dc287e4

Since commit 25f39d3 ("s390/pci: Ignore RID for isolated VFs") PFs
which are not initially configured but in standby are considered
isolated. That is they create only a single function PCI domain. Due to
the PCI domains being created on discovery, this means that even if they
are configured later on, sibling PFs and their child VFs will not be
added to their PCI domain breaking SR-IOV expectations.

The reason the referenced commit ignored standby PFs for the creation of
multi-function PCI subhierarchies, was to work around a PCI domain
renumbering scenario on reboot. The renumbering would occur after
removing a previously in standby PF, whose domain number is used for its
configured sibling PFs and their child VFs, but which itself remained in
standby. When this is followed by a reboot, the sibling PF is used
instead to determine the PCI domain number of it and its child VFs.

In principle it is not possible to know which standby PFs will be
configured later and which may be removed. The PCI domain and root bus
are pre-requisites for hotplug slots so the decision of which functions
belong to which domain can not be postponed. With the renumbering
occurring only in rare circumstances and being generally benign, accept
it as an oddity and fix SR-IOV for initially standby PFs simply by
allowing them to create PCI domains.

	Cc: [email protected]
	Reviewed-by: Gerd Bayer <[email protected]>
Fixes: 25f39d3 ("s390/pci: Ignore RID for isolated VFs")
	Signed-off-by: Niklas Schnelle <[email protected]>
	Signed-off-by: Alexander Gordeev <[email protected]>
(cherry picked from commit dc287e4)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-2741
Rebuild_History Non-Buildable kernel-4.18.0-553.46.1.el8_10
commit-author Niklas Schnelle <[email protected]>
commit 0579388

This creates a new zpci_iov_find_parent_pf() function which a future
commit can use to find if a VF has a configured parent PF. Use
zdev->rid instead of zdev->devfn such that the new function can be used
before it has been decided if the RID will be exposed and zdev->devfn is
set. Also handle the hypotheical case that the RID is not available but
there is an otherwise matching zbus.

Fixes: 25f39d3 ("s390/pci: Ignore RID for isolated VFs")
	Cc: [email protected]
	Reviewed-by: Halil Pasic <[email protected]>
	Signed-off-by: Niklas Schnelle <[email protected]>
	Signed-off-by: Vasily Gorbik <[email protected]>
(cherry picked from commit 0579388)
	Signed-off-by: Jonathan Maple <[email protected]>
jira LE-2741
Rebuild_History Non-Buildable kernel-4.18.0-553.46.1.el8_10
commit-author Niklas Schnelle <[email protected]>
commit 2844ddb

In contrast to the commit message of the fixed commit VFs whose parent
PF is not configured are not always isolated, that is put on their own
PCI domain. This is because for VFs to be added to an existing PCI
domain it is enough for that PCI domain to share the same topology ID or
PCHID. Such a matching PCI domain without a parent PF may exist when
a PF from the same PCI card created the domain with the VF being a child
of a different, non accessible, PF. While not causing technical issues
it makes the rules which VFs are isolated inconsistent.

Fix this by explicitly checking that the parent PF exists on the PCI
domain determined by the topology ID or PCHID before registering the VF.
This works because a parent PF which is under control of this Linux
instance must be enabled and configured at the point where its child VFs
appear because otherwise SR-IOV could not have been enabled on the
parent.

Fixes: 25f39d3 ("s390/pci: Ignore RID for isolated VFs")
	Cc: [email protected]
	Reviewed-by: Halil Pasic <[email protected]>
	Signed-off-by: Niklas Schnelle <[email protected]>
	Signed-off-by: Vasily Gorbik <[email protected]>
(cherry picked from commit 2844ddb)
	Signed-off-by: Jonathan Maple <[email protected]>
Rebuild_History BUILDABLE
Rebuilding Kernel from rpm changelog with Fuzz Limit: 87.50%
Number of commits in upstream range v4.18~1..master: 538207
Number of commits in rpm: 12
Number of commits matched with upstream: 6 (50.00%)
Number of commits in upstream but not in rpm: 538201
Number of commits NOT found in upstream: 6 (50.00%)

Rebuilding Kernel on Branch rocky8_10_rebuild_kernel-4.18.0-553.46.1.el8_10 for kernel-4.18.0-553.46.1.el8_10
Clean Cherry Picks: 6 (100.00%)
Empty Cherry Picks: 0 (0.00%)
_______________________________

Full Details Located here:
ciq/ciq_backports/kernel-4.18.0-553.46.1.el8_10/rebuild.details.txt

Includes:
* git commit header above
* Empty Commits with upstream SHA
* RPM ChangeLog Entries that could not be matched

Individual Empty Commit failures contained in the same containing directory.
The git message for empty commits will have the path for the failed commit.
File names are the first 8 characters of the upstream SHA
jira LE-2741
cve CVE-2024-50138
Rebuild_History Non-Buildable kernel-4.18.0-553.47.1.el8_10
commit-author Wander Lairson Costa <[email protected]>
commit 8b62645
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.47.1.el8_10/8b62645b.failed

The function __bpf_ringbuf_reserve is invoked from a tracepoint, which
disables preemption. Using spinlock_t in this context can lead to a
"sleep in atomic" warning in the RT variant. This issue is illustrated
in the example below:

BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 556208, name: test_progs
preempt_count: 1, expected: 0
RCU nest depth: 1, expected: 1
INFO: lockdep is turned off.
Preemption disabled at:
[<ffffd33a5c88ea44>] migrate_enable+0xc0/0x39c
CPU: 7 PID: 556208 Comm: test_progs Tainted: G
Hardware name: Qualcomm SA8775P Ride (DT)
Call trace:
 dump_backtrace+0xac/0x130
 show_stack+0x1c/0x30
 dump_stack_lvl+0xac/0xe8
 dump_stack+0x18/0x30
 __might_resched+0x3bc/0x4fc
 rt_spin_lock+0x8c/0x1a4
 __bpf_ringbuf_reserve+0xc4/0x254
 bpf_ringbuf_reserve_dynptr+0x5c/0xdc
 bpf_prog_ac3d15160d62622a_test_read_write+0x104/0x238
 trace_call_bpf+0x238/0x774
 perf_call_bpf_enter.isra.0+0x104/0x194
 perf_syscall_enter+0x2f8/0x510
 trace_sys_enter+0x39c/0x564
 syscall_trace_enter+0x220/0x3c0
 do_el0_svc+0x138/0x1dc
 el0_svc+0x54/0x130
 el0t_64_sync_handler+0x134/0x150
 el0t_64_sync+0x17c/0x180

Switch the spinlock to raw_spinlock_t to avoid this error.

Fixes: 457f443 ("bpf: Implement BPF ring buffer and verifier support for it")
	Reported-by: Brian Grech <[email protected]>
	Signed-off-by: Wander Lairson Costa <[email protected]>
	Signed-off-by: Wander Lairson Costa <[email protected]>
	Signed-off-by: Daniel Borkmann <[email protected]>
	Acked-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
(cherry picked from commit 8b62645)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	kernel/bpf/ringbuf.c
jira LE-2741
Rebuild_History Non-Buildable kernel-4.18.0-553.47.1.el8_10
commit-author Heiner Kallweit <[email protected]>
commit f32a213
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-4.18.0-553.47.1.el8_10/f32a2137.failed

If a network device is runtime-suspended then:
- network device may be flagged as detached and all ethtool ops (even if not
  accessing the device) will fail because netif_device_present() returns
  false
- ethtool ops may fail because device is not accessible (e.g. because being
  in D3 in case of a PCI device)

It may not be desirable that userspace can't use even simple ethtool ops
that not access the device if interface or link is down. To be more friendly
to userspace let's ensure that device is runtime-resumed when executing the
respective ethtool op in kernel.

	Signed-off-by: Heiner Kallweit <[email protected]>
	Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit f32a213)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	net/ethtool/ioctl.c
jira LE-2741
Rebuild_History Non-Buildable kernel-4.18.0-553.47.1.el8_10
commit-author Scott Mayhew <[email protected]>
commit 0c8c7c5

This is a slight variation on a patch previously proposed by Neil Brown
that never got merged.

Prior to commit 5ceb9d7 ("NFS: Refactor nfs_lookup_revalidate()"),
any error from nfs_lookup_verify_inode() other than -ESTALE would result
in nfs_lookup_revalidate() returning that error (-ESTALE is mapped to
zero).

Since that commit, all errors result in nfs_lookup_revalidate()
returning zero, resulting in dentries being invalidated where they
previously were not (particularly in the case of -ERESTARTSYS).

Fix it by passing the actual error code to nfs_lookup_revalidate_done(),
and leaving the decision on whether to  map the error code to zero or
one to nfs_lookup_revalidate_done().

A simple reproducer is to run the following python code in a
subdirectory of an NFS mount (not in the root of the NFS mount):

---8<---
import os
import multiprocessing
import time

if __name__=="__main__":
    multiprocessing.set_start_method("spawn")

    count = 0
    while True:
        try:
            os.getcwd()
            pool = multiprocessing.Pool(10)
            pool.close()
            pool.terminate()
            count += 1
        except Exception as e:
            print(f"Failed after {count} iterations")
            print(e)
            break
---8<---

Prior to commit 5ceb9d7, the above code would run indefinitely.
After commit 5ceb9d7, it fails almost immediately with -ENOENT.

	Signed-off-by: Scott Mayhew <[email protected]>
	Signed-off-by: Trond Myklebust <[email protected]>
(cherry picked from commit 0c8c7c5)
	Signed-off-by: Jonathan Maple <[email protected]>
Rebuild_History BUILDABLE
Rebuilding Kernel from rpm changelog with Fuzz Limit: 87.50%
Number of commits in upstream range v4.18~1..master: 538207
Number of commits in rpm: 9
Number of commits matched with upstream: 3 (33.33%)
Number of commits in upstream but not in rpm: 538204
Number of commits NOT found in upstream: 6 (66.67%)

Rebuilding Kernel on Branch rocky8_10_rebuild_kernel-4.18.0-553.47.1.el8_10 for kernel-4.18.0-553.47.1.el8_10
Clean Cherry Picks: 1 (33.33%)
Empty Cherry Picks: 2 (66.67%)
_______________________________

Full Details Located here:
ciq/ciq_backports/kernel-4.18.0-553.47.1.el8_10/rebuild.details.txt

Includes:
* git commit header above
* Empty Commits with upstream SHA
* RPM ChangeLog Entries that could not be matched

Individual Empty Commit failures contained in the same containing directory.
The git message for empty commits will have the path for the failed commit.
File names are the first 8 characters of the upstream SHA
@thefossguy-ciq
Copy link

@PlaidCat the missing commit (net: skb: exclude the single page frag cache for too small alloc) might be this revert: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=011b0335903832facca86cd8ed05d7d8d94c9c76

Copy link

@thefossguy-ciq thefossguy-ciq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't perform a 1-to-1 match of the commits to the SRPM changelog but with a relatively close look, nothing looks off. LGTM!

🚤

Copy link
Collaborator

@bmastbergen bmastbergen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥌

@PlaidCat
Copy link
Collaborator Author

PlaidCat commented Apr 9, 2025

@PlaidCat the missing commit (net: skb: exclude the single page frag cache for too small alloc) might be this revert: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=011b0335903832facca86cd8ed05d7d8d94c9c76

Possibly ... like i've said before this process is imperfect thats why the only buildable commits are the ones with ebuild rocky8_10 with kernel-<src.rpm> as they are the replacement of the entire directory with the rpmbuild -bp <src.rpm> and the delta is huge. The important thing is we're not missing a bunch of commits, @bmastbergen caught this before where the kernel.org/master checkout I had was stale so it missed a lot of N+2month commits that it should have caught.

Any rate thanks for the review.

@PlaidCat PlaidCat merged commit 01aef32 into rocky8_10 Apr 9, 2025
2 checks passed
@PlaidCat PlaidCat deleted the rocky8_10_rebuild branch April 9, 2025 13:57
github-actions bot pushed a commit that referenced this pull request Aug 8, 2025
When sending a packet with virtio_net_hdr to tun device, if the gso_type
in virtio_net_hdr is SKB_GSO_UDP and the gso_size is less than udphdr
size, below crash may happen.

  ------------[ cut here ]------------
  kernel BUG at net/core/skbuff.c:4572!
  Oops: invalid opcode: 0000 [#1] SMP NOPTI
  CPU: 0 UID: 0 PID: 62 Comm: mytest Not tainted 6.16.0-rc7 #203 PREEMPT(voluntary)
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
  RIP: 0010:skb_pull_rcsum+0x8e/0xa0
  Code: 00 00 5b c3 cc cc cc cc 8b 93 88 00 00 00 f7 da e8 37 44 38 00 f7 d8 89 83 88 00 00 00 48 8b 83 c8 00 00 00 5b c3 cc cc cc cc <0f> 0b 0f 0b 66 66 2e 0f 1f 84 00 000
  RSP: 0018:ffffc900001fba38 EFLAGS: 00000297
  RAX: 0000000000000004 RBX: ffff8880040c1000 RCX: ffffc900001fb948
  RDX: ffff888003e6d700 RSI: 0000000000000008 RDI: ffff88800411a062
  RBP: ffff8880040c1000 R08: 0000000000000000 R09: 0000000000000001
  R10: ffff888003606c00 R11: 0000000000000001 R12: 0000000000000000
  R13: ffff888004060900 R14: ffff888004050000 R15: ffff888004060900
  FS:  000000002406d3c0(0000) GS:ffff888084a19000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000020000040 CR3: 0000000004007000 CR4: 00000000000006f0
  Call Trace:
   <TASK>
   udp_queue_rcv_one_skb+0x176/0x4b0 net/ipv4/udp.c:2445
   udp_queue_rcv_skb+0x155/0x1f0 net/ipv4/udp.c:2475
   udp_unicast_rcv_skb+0x71/0x90 net/ipv4/udp.c:2626
   __udp4_lib_rcv+0x433/0xb00 net/ipv4/udp.c:2690
   ip_protocol_deliver_rcu+0xa6/0x160 net/ipv4/ip_input.c:205
   ip_local_deliver_finish+0x72/0x90 net/ipv4/ip_input.c:233
   ip_sublist_rcv_finish+0x5f/0x70 net/ipv4/ip_input.c:579
   ip_sublist_rcv+0x122/0x1b0 net/ipv4/ip_input.c:636
   ip_list_rcv+0xf7/0x130 net/ipv4/ip_input.c:670
   __netif_receive_skb_list_core+0x21d/0x240 net/core/dev.c:6067
   netif_receive_skb_list_internal+0x186/0x2b0 net/core/dev.c:6210
   napi_complete_done+0x78/0x180 net/core/dev.c:6580
   tun_get_user+0xa63/0x1120 drivers/net/tun.c:1909
   tun_chr_write_iter+0x65/0xb0 drivers/net/tun.c:1984
   vfs_write+0x300/0x420 fs/read_write.c:593
   ksys_write+0x60/0xd0 fs/read_write.c:686
   do_syscall_64+0x50/0x1c0 arch/x86/entry/syscall_64.c:63
   </TASK>

To trigger gso segment in udp_queue_rcv_skb(), we should also set option
UDP_ENCAP_ESPINUDP to enable udp_sk(sk)->encap_rcv. When the encap_rcv
hook return 1 in udp_queue_rcv_one_skb(), udp_csum_pull_header() will try
to pull udphdr, but the skb size has been segmented to gso size, which
leads to this crash.

Previous commit cf329aa ("udp: cope with UDP GRO packet misdirection")
introduces segmentation in UDP receive path only for GRO, which was never
intended to be used for UFO, so drop UFO packets in udp_rcv_segment().

Link: https://lore.kernel.org/netdev/[email protected]/
Link: https://lore.kernel.org/netdev/[email protected]/
Fixes: cf329aa ("udp: cope with UDP GRO packet misdirection")
Suggested-by: Willem de Bruijn <[email protected]>
Signed-off-by: Wang Liang <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
github-actions bot pushed a commit that referenced this pull request Aug 15, 2025
[ Upstream commit d46e51f ]

When sending a packet with virtio_net_hdr to tun device, if the gso_type
in virtio_net_hdr is SKB_GSO_UDP and the gso_size is less than udphdr
size, below crash may happen.

  ------------[ cut here ]------------
  kernel BUG at net/core/skbuff.c:4572!
  Oops: invalid opcode: 0000 [#1] SMP NOPTI
  CPU: 0 UID: 0 PID: 62 Comm: mytest Not tainted 6.16.0-rc7 #203 PREEMPT(voluntary)
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
  RIP: 0010:skb_pull_rcsum+0x8e/0xa0
  Code: 00 00 5b c3 cc cc cc cc 8b 93 88 00 00 00 f7 da e8 37 44 38 00 f7 d8 89 83 88 00 00 00 48 8b 83 c8 00 00 00 5b c3 cc cc cc cc <0f> 0b 0f 0b 66 66 2e 0f 1f 84 00 000
  RSP: 0018:ffffc900001fba38 EFLAGS: 00000297
  RAX: 0000000000000004 RBX: ffff8880040c1000 RCX: ffffc900001fb948
  RDX: ffff888003e6d700 RSI: 0000000000000008 RDI: ffff88800411a062
  RBP: ffff8880040c1000 R08: 0000000000000000 R09: 0000000000000001
  R10: ffff888003606c00 R11: 0000000000000001 R12: 0000000000000000
  R13: ffff888004060900 R14: ffff888004050000 R15: ffff888004060900
  FS:  000000002406d3c0(0000) GS:ffff888084a19000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000020000040 CR3: 0000000004007000 CR4: 00000000000006f0
  Call Trace:
   <TASK>
   udp_queue_rcv_one_skb+0x176/0x4b0 net/ipv4/udp.c:2445
   udp_queue_rcv_skb+0x155/0x1f0 net/ipv4/udp.c:2475
   udp_unicast_rcv_skb+0x71/0x90 net/ipv4/udp.c:2626
   __udp4_lib_rcv+0x433/0xb00 net/ipv4/udp.c:2690
   ip_protocol_deliver_rcu+0xa6/0x160 net/ipv4/ip_input.c:205
   ip_local_deliver_finish+0x72/0x90 net/ipv4/ip_input.c:233
   ip_sublist_rcv_finish+0x5f/0x70 net/ipv4/ip_input.c:579
   ip_sublist_rcv+0x122/0x1b0 net/ipv4/ip_input.c:636
   ip_list_rcv+0xf7/0x130 net/ipv4/ip_input.c:670
   __netif_receive_skb_list_core+0x21d/0x240 net/core/dev.c:6067
   netif_receive_skb_list_internal+0x186/0x2b0 net/core/dev.c:6210
   napi_complete_done+0x78/0x180 net/core/dev.c:6580
   tun_get_user+0xa63/0x1120 drivers/net/tun.c:1909
   tun_chr_write_iter+0x65/0xb0 drivers/net/tun.c:1984
   vfs_write+0x300/0x420 fs/read_write.c:593
   ksys_write+0x60/0xd0 fs/read_write.c:686
   do_syscall_64+0x50/0x1c0 arch/x86/entry/syscall_64.c:63
   </TASK>

To trigger gso segment in udp_queue_rcv_skb(), we should also set option
UDP_ENCAP_ESPINUDP to enable udp_sk(sk)->encap_rcv. When the encap_rcv
hook return 1 in udp_queue_rcv_one_skb(), udp_csum_pull_header() will try
to pull udphdr, but the skb size has been segmented to gso size, which
leads to this crash.

Previous commit cf329aa ("udp: cope with UDP GRO packet misdirection")
introduces segmentation in UDP receive path only for GRO, which was never
intended to be used for UFO, so drop UFO packets in udp_rcv_segment().

Link: https://lore.kernel.org/netdev/[email protected]/
Link: https://lore.kernel.org/netdev/[email protected]/
Fixes: cf329aa ("udp: cope with UDP GRO packet misdirection")
Suggested-by: Willem de Bruijn <[email protected]>
Signed-off-by: Wang Liang <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants