-
-
Notifications
You must be signed in to change notification settings - Fork 184
Intermittent kernel panic in CapSet.InheritableBounds: sleep/reclaim path triggered from atomic context #1803
Copy link
Copy link
Open
Description
Summary
Running DragonOS c_unitest for capability syscalls can intermittently panic during:
CapSet.InheritableBounds
Panic shows schedule() assertion failure (preempt_count != 0) and the stack indicates scheduler entry from a memory/page-fault/reclaim path while in an atomic context.
Reproduced at commit: 2f6e86f1
Test Context
c_unitest output (stable most of time, panic is intermittent):
[----------] 7 tests from CapSet
[ RUN ] CapSet.EffectiveMustBeSubsetOfPermitted
[ OK ] CapSet.EffectiveMustBeSubsetOfPermitted (0 ms)
[ RUN ] CapSet.VersionPaths
[ OK ] CapSet.VersionPaths (0 ms)
[ RUN ] CapSet.InvalidVersionWithData
[ OK ] CapSet.InvalidVersionWithData (0 ms)
[ RUN ] CapSet.NegativePid
[ OK ] CapSet.NegativePid (0 ms)
[ RUN ] CapSet.NonCurrentPid
[ OK ] CapSet.NonCurrentPid (0 ms)
[ RUN ] CapSet.PermittedNotIncrease
[ OK ] CapSet.PermittedNotIncrease (77 ms)
[ RUN ] CapSet.InheritableBounds
Panic Details
[ ERROR ] (src/debug/panic/mod.rs:43) Kernel Panic Occurred. raw_pid: 0
Location:
File: src/sched/mod.rs
Line: 850, Column: 5
Message:
assertion `left == right` failed
left: 1
right: 0
Rust Panic Backtrace:
[1] _Unwind_Backtrace
[2] dragonos_kernel::debug::panic::hook::print_stack_trace
[3] __rustc::rust_begin_unwind
[4] core::panicking::panic_fmt
[5] core::panicking::assert_failed_inner
[6] core::panicking::assert_failed
[7] dragonos_kernel::sched::schedule
[8] dragonos_kernel::libs::wait_queue::block_current_impl
[9] dragonos_kernel::libs::wait_queue::WaitQueue::wait_until_impl
[10] dragonos_kernel::mm::page::PageManager::get
[11] dragonos_kernel::mm::ucontext::LockedVMA::unmap
[12] <dragonos_kernel::mm::ucontext::InnerAddressSpace as core::ops::drop::Drop>::drop
[13] alloc::sync::Arc<T,A>::drop_slow
[14] dragonos_kernel::sched::__schedule
[15] x86_64_do_irq
[16] Restore_all
Expected Behavior
capsettest cases should pass consistently.- No scheduler assertion (
preempt_countmismatch).
Actual Behavior
- Intermittent panic in/around
CapSet.InheritableBounds. schedule()is reached whilepreempt_count == 1.
Suspected Root Cause
The issue is likely not capset semantics itself, but context safety:
- In an atomic/irq-off or lock-held path, code reaches an
Arcdrop chain. - Drop path enters memory reclaim/fault path (
PageManager::get/VMA::unmap/AddressSpace::drop). - Reclaim/fault path attempts to block/schedule.
schedule()asserts because current context is non-preemptible (preempt_count != 0).
In short: a potentially sleeping release path is reached from atomic context.
Why this is tricky
cap/credobjects are hot-path data and may be observed in trap/fault/scheduler-related paths.- Replacing cap lock with sleeping primitives (
Mutex/RwSem) may violate non-sleepable context constraints in some call chains. - Need Linux-compatible semantics while preserving atomic-context safety.
Scope / Constraints
- Must align behavior with Linux 6.6 semantics.
- Must avoid workaround-style masking of panic; fix should remove atomic-context sleep/reclaim hazard at source.
- Must not introduce regressions in scheduling/context-switch fast path.
Reproduction Notes
- Trigger by repeatedly running capability-related
c_unitestsuite, especiallyCapSet.InheritableBounds. - Panic is intermittent; stress/repeat loops increase hit probability.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels