Skip to content

Conversation

thefossguy-ciq
Copy link

  • Commit Message Requirements
  • Built against Vault/LTS Environment
  • kABI Check Passed, where Valid (Pre 9.4 RT does not have kABI stability)
  • Boot Test
  • Kernel SelfTest results
  • Additional Tests as determined relevant

Commit message

jira VULN-9261
cve CVE-2024-0562
commit-author Khazhismel Kumykov <[email protected]> commit f87904c075515f3e1d8f4a7115869d3b914674fd

When a disk is removed, bdi_unregister gets called to stop further writeback and wait for associated delayed work to complete.  However, wb_inode_writeback_end() may schedule bandwidth estimation dwork after this has completed, which can result in the timer attempting to access the just freed bdi_writeback.

Fix this by checking if the bdi_writeback is alive, similar to when scheduling writeback work.

Since this requires wb->work_lock, and wb_inode_writeback_end() may get called from interrupt, switch wb->work_lock to an irqsafe lock.

Link: https://lkml.kernel.org/r/[email protected] Fixes: 45a2966fd641 ("writeback: fix bandwidth estimate for spiky workload")
	Signed-off-by: Khazhismel Kumykov <[email protected]>
	Reviewed-by: Jan Kara <[email protected]>
	Cc: Michael Stapelberg <[email protected]>
	Cc: Wu Fengguang <[email protected]>
	Cc: Alexander Viro <[email protected]>
	Cc: <[email protected]>
	Signed-off-by: Andrew Morton <[email protected]>
(cherry picked from commit f87904c075515f3e1d8f4a7115869d3b914674fd)
	Signed-off-by: Pratham Patel <[email protected]>

Kernel build logs

  INSTALL sound/usb/snd-usbmidi-lib.ko
  INSTALL sound/usb/usx2y/snd-usb-us122l.ko
  INSTALL sound/usb/usx2y/snd-usb-usx2y.ko
  INSTALL sound/virtio/virtio_snd.ko
  INSTALL sound/x86/snd-hdmi-lpe-audio.ko
  INSTALL virt/lib/irqbypass.ko
  DEPMOD  4.18.0-_ppatel__ciqlts8_8-rt+
[TIMER]{MODULES}: 39s
Making Install
sh ./arch/x86/boot/install.sh 4.18.0-_ppatel__ciqlts8_8-rt+ arch/x86/boot/bzImage \
	System.map "/boot"
[TIMER]{INSTALL}: 11s
Setting Default Kernel to /boot/vmlinuz-4.18.0-_ppatel__ciqlts8_8-rt+ and Index to 0
Hopefully Grub2.0 took everything ... rebooting after time metrices
[TIMER]{MRPROPER}: 5s
[TIMER]{BUILD}: 1024s
[TIMER]{MODULES}: 39s
[TIMER]{INSTALL}: 11s
[TIMER]{TOTAL} 1083s
Rebooting in 10 seconds

Kselftests

$ grep '^ok ' kselftest-before.log | wc -l && grep '^ok ' kselftest-after.log | wc -l
     235
     235

$ grep '^not ok ' kselftest-before.log | wc -l && grep '^not ok ' kselftest-after.log | wc -l
      57
      57

kselftest-after.log
kselftest-before.log

jira VULN-9261
cve CVE-2024-0562
commit-author Khazhismel Kumykov <[email protected]>
commit f87904c

When a disk is removed, bdi_unregister gets called to stop further
writeback and wait for associated delayed work to complete.  However,
wb_inode_writeback_end() may schedule bandwidth estimation dwork after
this has completed, which can result in the timer attempting to access the
just freed bdi_writeback.

Fix this by checking if the bdi_writeback is alive, similar to when
scheduling writeback work.

Since this requires wb->work_lock, and wb_inode_writeback_end() may get
called from interrupt, switch wb->work_lock to an irqsafe lock.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 45a2966 ("writeback: fix bandwidth estimate for spiky workload")
	Signed-off-by: Khazhismel Kumykov <[email protected]>
	Reviewed-by: Jan Kara <[email protected]>
	Cc: Michael Stapelberg <[email protected]>
	Cc: Wu Fengguang <[email protected]>
	Cc: Alexander Viro <[email protected]>
	Cc: <[email protected]>
	Signed-off-by: Andrew Morton <[email protected]>
(cherry picked from commit f87904c)
	Signed-off-by: Pratham Patel <[email protected]>
Copy link
Collaborator

@bmastbergen bmastbergen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥌

@thefossguy-ciq thefossguy-ciq requested review from PlaidCat and removed request for gvrose8192 March 3, 2025 14:42
Copy link
Collaborator

@PlaidCat PlaidCat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@thefossguy-ciq thefossguy-ciq merged commit 9307678 into ciqlts8_8-rt Mar 5, 2025
1 check passed
@PlaidCat PlaidCat deleted the {ppatel}_ciqlts8_8-rt branch April 18, 2025 15:17
github-actions bot pushed a commit that referenced this pull request Oct 10, 2025
…find_and_get_node_by_id()

JIRA: https://issues.redhat.com/browse/RHEL-118965

commit 4207b55
Author: Tejun Heo <[email protected]>
Date:   Tue Jan 9 11:48:04 2024 -1000

    kernfs: RCU protect kernfs_nodes and avoid kernfs_idr_lock in kernfs_find_and_get_node_by_id()

    The BPF helper bpf_cgroup_from_id() calls kernfs_find_and_get_node_by_id()
    which acquires kernfs_idr_lock, which is an non-raw non-IRQ-safe lock. This
    can lead to deadlocks as bpf_cgroup_from_id() can be called from any BPF
    programs including e.g. the ones that attach to functions which are holding
    the scheduler rq lock.

    Consider the following BPF program:

      SEC("fentry/__set_cpus_allowed_ptr_locked")
      int BPF_PROG(__set_cpus_allowed_ptr_locked, struct task_struct *p,
    	       struct affinity_context *affn_ctx, struct rq *rq, struct rq_flags *rf)
      {
    	  struct cgroup *cgrp = bpf_cgroup_from_id(p->cgroups->dfl_cgrp->kn->id);

    	  if (cgrp) {
    		  bpf_printk("%d[%s] in %s", p->pid, p->comm, cgrp->kn->name);
    		  bpf_cgroup_release(cgrp);
    	  }
    	  return 0;
      }

    __set_cpus_allowed_ptr_locked() is called with rq lock held and the above
    BPF program calls bpf_cgroup_from_id() within leading to the following
    lockdep warning:

      =====================================================
      WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
      6.7.0-rc3-work-00053-g07124366a1d7-dirty #147 Not tainted
      -----------------------------------------------------
      repro/1620 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
      ffffffff833b3688 (kernfs_idr_lock){+.+.}-{2:2}, at: kernfs_find_and_get_node_by_id+0x1e/0x70

    		and this task is already holding:
      ffff888237ced698 (&rq->__lock){-.-.}-{2:2}, at: task_rq_lock+0x4e/0xf0
      which would create a new lock dependency:
       (&rq->__lock){-.-.}-{2:2} -> (kernfs_idr_lock){+.+.}-{2:2}
      ...
       Possible interrupt unsafe locking scenario:

    	 CPU0                    CPU1
    	 ----                    ----
        lock(kernfs_idr_lock);
    				 local_irq_disable();
    				 lock(&rq->__lock);
    				 lock(kernfs_idr_lock);
        <Interrupt>
          lock(&rq->__lock);

    		 *** DEADLOCK ***
      ...
      Call Trace:
       dump_stack_lvl+0x55/0x70
       dump_stack+0x10/0x20
       __lock_acquire+0x781/0x2a40
       lock_acquire+0xbf/0x1f0
       _raw_spin_lock+0x2f/0x40
       kernfs_find_and_get_node_by_id+0x1e/0x70
       cgroup_get_from_id+0x21/0x240
       bpf_cgroup_from_id+0xe/0x20
       bpf_prog_98652316e9337a5a___set_cpus_allowed_ptr_locked+0x96/0x11a
       bpf_trampoline_6442545632+0x4f/0x1000
       __set_cpus_allowed_ptr_locked+0x5/0x5a0
       sched_setaffinity+0x1b3/0x290
       __x64_sys_sched_setaffinity+0x4f/0x60
       do_syscall_64+0x40/0xe0
       entry_SYSCALL_64_after_hwframe+0x46/0x4e

    Let's fix it by protecting kernfs_node and kernfs_root with RCU and making
    kernfs_find_and_get_node_by_id() acquire rcu_read_lock() instead of
    kernfs_idr_lock.

    This adds an rcu_head to kernfs_node making it larger by 16 bytes on 64bit.
    Combined with the preceding rearrange patch, the net increase is 8 bytes.

    Signed-off-by: Tejun Heo <[email protected]>
    Cc: Andrea Righi <[email protected]>
    Cc: Geert Uytterhoeven <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

Signed-off-by: David Arcari <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants