Skip to content

Commit 2277b49

Browse files
diandersDaniel Thompson
authored andcommitted
kdb: Fix stack crawling on 'running' CPUs that aren't the master
In kdb when you do 'btc' (back trace on CPU) it doesn't necessarily give you the right info. Specifically on many architectures (including arm64, where I tested) you can't dump the stack of a "running" process that isn't the process running on the current CPU. This can be seen by this: echo SOFTLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT # wait 2 seconds <sysrq>g Here's what I see now on rk3399-gru-kevin. I see the stack crawl for the CPU that handled the sysrq but everything else just shows me stuck in __switch_to() which is bogus: ====== [0]kdb> btc btc: cpu status: Currently on cpu 0 Available cpus: 0, 1-3(I), 4, 5(I) Stack traceback for pid 0 0xffffff801101a9c0 0 0 1 0 R 0xffffff801101b3b0 *swapper/0 Call trace: dump_backtrace+0x0/0x138 ... kgdb_compiled_brk_fn+0x34/0x44 ... sysrq_handle_dbg+0x34/0x5c Stack traceback for pid 0 0xffffffc0f175a040 0 0 1 1 I 0xffffffc0f175aa30 swapper/1 Call trace: __switch_to+0x1e4/0x240 0xffffffc0f65616c0 Stack traceback for pid 0 0xffffffc0f175d040 0 0 1 2 I 0xffffffc0f175da30 swapper/2 Call trace: __switch_to+0x1e4/0x240 0xffffffc0f65806c0 Stack traceback for pid 0 0xffffffc0f175b040 0 0 1 3 I 0xffffffc0f175ba30 swapper/3 Call trace: __switch_to+0x1e4/0x240 0xffffffc0f659f6c0 Stack traceback for pid 1474 0xffffffc0dde8b040 1474 727 1 4 R 0xffffffc0dde8ba30 bash Call trace: __switch_to+0x1e4/0x240 __schedule+0x464/0x618 0xffffffc0dde8b040 Stack traceback for pid 0 0xffffffc0f17b0040 0 0 1 5 I 0xffffffc0f17b0a30 swapper/5 Call trace: __switch_to+0x1e4/0x240 0xffffffc0f65dd6c0 === The problem is that 'btc' eventually boils down to show_stack(task_struct, NULL); ...and show_stack() doesn't work for "running" CPUs because their registers haven't been stashed. On x86 things might work better (I haven't tested) because kdb has a special case for x86 in kdb_show_stack() where it passes the stack pointer to show_stack(). This wouldn't work on arm64 where the stack crawling function seems needs the "fp" and "pc", not the "sp" which is presumably why arm64's show_stack() function totally ignores the "sp" parameter. NOTE: we _can_ get a good stack dump for all the cpus if we manually switch each one to the kdb master and do a back trace. AKA: cpu 4 bt ...will give the expected trace. That's because now arm64's dump_backtrace will now see that "tsk == current" and go through a different path. In this patch I fix the problems by catching a request to stack crawl a task that's running on a CPU and then I ask that CPU to do the stack crawl. NOTE: this will (presumably) change what stack crawls are printed for x86 machines. Now kdb functions will show up in the stack crawl. Presumably this is OK but if it's not we can go back and add a special case for x86 again. Signed-off-by: Douglas Anderson <[email protected]> Acked-by: Will Deacon <[email protected]> Signed-off-by: Daniel Thompson <[email protected]>
1 parent 55a7e23 commit 2277b49

File tree

3 files changed

+43
-12
lines changed

3 files changed

+43
-12
lines changed

kernel/debug/debug_core.c

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -441,6 +441,37 @@ int dbg_remove_all_break(void)
441441
return 0;
442442
}
443443

444+
#ifdef CONFIG_KGDB_KDB
445+
void kdb_dump_stack_on_cpu(int cpu)
446+
{
447+
if (cpu == raw_smp_processor_id()) {
448+
dump_stack();
449+
return;
450+
}
451+
452+
if (!(kgdb_info[cpu].exception_state & DCPU_IS_SLAVE)) {
453+
kdb_printf("ERROR: Task on cpu %d didn't stop in the debugger\n",
454+
cpu);
455+
return;
456+
}
457+
458+
/*
459+
* In general, architectures don't support dumping the stack of a
460+
* "running" process that's not the current one. From the point of
461+
* view of the Linux, kernel processes that are looping in the kgdb
462+
* slave loop are still "running". There's also no API (that actually
463+
* works across all architectures) that can do a stack crawl based
464+
* on registers passed as a parameter.
465+
*
466+
* Solve this conundrum by asking slave CPUs to do the backtrace
467+
* themselves.
468+
*/
469+
kgdb_info[cpu].exception_state |= DCPU_WANT_BT;
470+
while (kgdb_info[cpu].exception_state & DCPU_WANT_BT)
471+
cpu_relax();
472+
}
473+
#endif
474+
444475
/*
445476
* Return true if there is a valid kgdb I/O module. Also if no
446477
* debugger is attached a message can be printed to the console about
@@ -580,6 +611,9 @@ static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs,
580611
atomic_xchg(&kgdb_active, cpu);
581612
break;
582613
}
614+
} else if (kgdb_info[cpu].exception_state & DCPU_WANT_BT) {
615+
dump_stack();
616+
kgdb_info[cpu].exception_state &= ~DCPU_WANT_BT;
583617
} else if (kgdb_info[cpu].exception_state & DCPU_IS_SLAVE) {
584618
if (!raw_spin_is_locked(&dbg_slave_lock))
585619
goto return_normal;

kernel/debug/debug_core.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ struct kgdb_state {
3333
#define DCPU_WANT_MASTER 0x1 /* Waiting to become a master kgdb cpu */
3434
#define DCPU_NEXT_MASTER 0x2 /* Transition from one master cpu to another */
3535
#define DCPU_IS_SLAVE 0x4 /* Slave cpu enter exception */
36+
#define DCPU_WANT_BT 0x8 /* Slave cpu should backtrace then clear flag */
3637

3738
struct debuggerinfo_struct {
3839
void *debuggerinfo;
@@ -75,6 +76,7 @@ extern int kdb_stub(struct kgdb_state *ks);
7576
extern int kdb_parse(const char *cmdstr);
7677
extern int kdb_common_init_state(struct kgdb_state *ks);
7778
extern int kdb_common_deinit_state(void);
79+
extern void kdb_dump_stack_on_cpu(int cpu);
7880
#else /* ! CONFIG_KGDB_KDB */
7981
static inline int kdb_stub(struct kgdb_state *ks)
8082
{

kernel/debug/kdb/kdb_bt.c

Lines changed: 7 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -22,20 +22,15 @@
2222
static void kdb_show_stack(struct task_struct *p, void *addr)
2323
{
2424
int old_lvl = console_loglevel;
25+
2526
console_loglevel = CONSOLE_LOGLEVEL_MOTORMOUTH;
2627
kdb_trap_printk++;
27-
kdb_set_current_task(p);
28-
if (addr) {
29-
show_stack((struct task_struct *)p, addr);
30-
} else if (kdb_current_regs) {
31-
#ifdef CONFIG_X86
32-
show_stack(p, &kdb_current_regs->sp);
33-
#else
34-
show_stack(p, NULL);
35-
#endif
36-
} else {
37-
show_stack(p, NULL);
38-
}
28+
29+
if (!addr && kdb_task_has_cpu(p))
30+
kdb_dump_stack_on_cpu(kdb_process_cpu(p));
31+
else
32+
show_stack(p, addr);
33+
3934
console_loglevel = old_lvl;
4035
kdb_trap_printk--;
4136
}

0 commit comments

Comments
 (0)