GDB overlay support for RISC-V by dmi391 · Pull Request #9 · sifive/riscv-binutils-gdb

dmi391 · 2022-09-14T03:33:33Z

Fixed problem with overlay support for RISC-V
Now GDB-client supports overlay debugging in auto-mode.

To able GDB support overlay debugging it is necessary to initialize pointer gdbarch->overlay_update with function pointer simple_overlay_update(struct obj_section *osect): In file /gdb/riscv-tdep.c at the end of definition of riscv_gdbarch_init(...) should be called set_gdbarch_overlay_update(gdbarch, simple_overlay_update) - similarly with file /gdb/m32r-tdep.c.

Without this fix GDB-client can't update overlay table _ovly_table from target RAM and overlay debugging doesn't work:

(gdb) overlay list
No sections are mapped.
(gdb) overlay load
This target does not know how to read its overlay state.

With this fix GDB-client is able to support overlay debugging in auto-mode (GDB-client updates overlay table _ovly_table from target RAM):

(gdb) set verbose on
(gdb) overlay auto
Automatic overlay debugging enabled.
...
(gdb) overlay list
Section .ovly1, loaded at 0x10080444 - 0x100805d0, mapped at 0x10000060 - 0x100001ec

Fixed problem with output of overlay GDB-commands

GDB-commands overlay auto, overlay manual, overlay off have incorrect output message. In file /gdb/symfile.c in functions overlay_auto_command(...), overlay_manual_command(...), overlay_off_command(...) in call printf_filtered(_("...")) it is necessary to add '\n' at the end of string. Otherwise output message of this GDB-commands unseparated with next GDB-command output message. This messages are displayed only with set verbose on.

Without this fix. Incorrect (unseparated):

(gdb) set verbose on
(gdb) overlay auto
<nothing>
(gdb) overlay list
Automatic overlay debugging enabled.No sections are mapped.

With this fix (added '\n'). Correct:

(gdb) set verbose on
(gdb) overlay auto
Automatic overlay debugging enabled.
(gdb) overlay list
No sections are mapped.

My overlay demo project for RISC-V: dmi391/overlay_demo
My custom build of GDB-client with this two fixes (it works correct): release gdb-riscv-ovly

1. Fixed problem with overlay support for RISC-V To able GDB support overlay debugging it is necessary to initialize pointer `gdbarch->overlay_update` with function pointer `simple_overlay_update(struct obj_section *osect)`: In file `/gdb/riscv-tdep.c` at the end of definition of `riscv_gdbarch_init(...)` should be called `set_gdbarch_overlay_update(gdbarch, simple_overlay_update)` - similarly with file `/gdb/m32r-tdep.c`. Without this fix GDB-client can't update overlay table `_ovly_table` from target RAM and overlay debugging doesn't work: (gdb) overlay list No sections are mapped. (gdb) overlay load This target does not know how to read its overlay state. With this fix GDB-client is able to support overlay debugging in auto-mode (GDB-client updates overlay table `_ovly_table` from target RAM): (gdb) set verbose on (gdb) overlay auto Automatic overlay debugging enabled. ... (gdb) overlay list Section .ovly1, loaded at 0x10080444 - 0x100805d0, mapped at 0x10000060 - 0x100001ec 2. Fixed problem with output of overlay GDB-commands GDB-commands `overlay auto`, `overlay manual`, `overlay off` have incorrect output message. In file `/gdb/symfile.c` in functions `overlay_auto_command(...)`, `overlay_manual_command(...)`, `overlay_off_command(...)` in call `printf_filtered(_("..."))` it is necessary to add '\n' at the end of string. Otherwise output message of this GDB-commands unseparated with next GDB-command output message. This messages are displayed only with `set verbose on`. Without this fix. Incorrect (unseparated): (gdb) set verbose on (gdb) overlay auto <nothing> (gdb) overlay list Automatic overlay debugging enabled.No sections are mapped. With this fix (added '\n'). Correct: (gdb) set verbose on (gdb) overlay auto Automatic overlay debugging enabled. (gdb) overlay list No sections are mapped.

While working on a later patch, which changes gdb.base/foll-vfork.exp, I noticed that sometimes I would hit this assert: x86_linux_update_debug_registers: Assertion `lwp_is_stopped (lwp)' failed. I eventually tracked it down to a combination of schedule-multiple mode being on, target-non-stop being off, follow-fork-mode being set to child, and some bad timing. The failing case is pretty simple, a single threaded application performs a vfork, the child process then execs some other application while the parent process (once the vfork child has completed its exec) just exits. As best I understand things, here's what happens when things go wrong: 1. The parent process performs a vfork, GDB sees the VFORKED event and creates an inferior and thread for the vfork child, 2. GDB resumes the vfork child process. As schedule-multiple is on and target-non-stop is off, this is translated into a request to start all processes (see user_visible_resume_ptid), 3. In the linux-nat layer we spot that one of the threads we are about to start is a vfork parent, and so don't start that thread (see resume_lwp), the vfork child thread is resumed, 4. GDB waits for the next event, eventually entering linux_nat_target::wait, which in turn calls linux_nat_wait_1, 5. In linux_nat_wait_1 we eventually call resume_stopped_resumed_lwps, this should restart threads that have stopped but don't actually have anything interesting to report. 6. Unfortunately, resume_stopped_resumed_lwps doesn't check for vfork parents like resume_lwp does, so at this point the vfork parent is resumed. This feels like the start of the bug, and this is where I'm proposing to fix things, but, resuming the vfork parent isn't the worst thing in the world because.... 7. As the vfork child is still alive the kernel holds the vfork parent stopped, 8. Eventually the child performs its exec and GDB is sent and EXECD event. However, because the parent is resumed, as soon as the child performs its exec the vfork parent also sends a VFORK_DONE event to GDB, 9. Depending on timing both of these events might seem to arrive in GDB at the same time. Normally GDB expects to see the EXECD or EXITED/SIGNALED event from the vfork child before getting the VFORK_DONE in the parent. We know this because it is as a result of the EXECD/EXITED/SIGNALED that GDB detaches from the parent (see handle_vfork_child_exec_or_exit for details). Further the comment in target/waitstatus.h on TARGET_WAITKIND_VFORK_DONE indicates that when we remain attached to the child (not the parent) we should not expect to see a VFORK_DONE, 10. If both events arrive at the same time then GDB will randomly choose one event to handle first, in some cases this will be the VFORK_DONE. As described above, upon seeing a VFORK_DONE GDB expects that (a) the vfork child has finished, however, in this case this is not completely true, the child has finished, but GDB has not processed the event associated with the completion yet, and (b) upon seeing a VFORK_DONE GDB assumes we are remaining attached to the parent, and so resumes the parent process, 11. GDB now handles the EXECD event. In our case we are detaching from the parent, so GDB calls target_detach (see handle_vfork_child_exec_or_exit), 12. While this has been going on the vfork parent is executing, and might even exit, 13. In linux_nat_target::detach the first thing we do is stop all threads in the process we're detaching from, the result of the stop request will be cached on the lwp_info object, 14. In our case the vfork parent has exited though, so when GDB waits for the thread, instead of a stop due to signal, we instead get a thread exited status, 15. Later in the detach process we try to resume the threads just prior to making the ptrace call to actually detach (see detach_one_lwp), as part of the process to resume a thread we try to touch some registers within the thread, and before doing this GDB asserts that the thread is stopped, 16. An exited thread is not classified as stopped, and so the assert triggers! So there's two bugs I see here. The first, and most critical one here is in step #6. I think that resume_stopped_resumed_lwps should not resume a vfork parent, just like resume_lwp doesn't resume a vfork parent. With this change in place the vfork parent will remain stopped in step instead GDB will only see the EXECD/EXITED/SIGNALLED event. The problems in #9 and #10 are therefore skipped and we arrive at #11, handling the EXECD event. As the parent is still stopped riscvarchive#12 doesn't apply, and in riscvarchive#13 when we try to stop the process we will see that it is already stopped, there's no risk of the vfork parent exiting before we get to this point. And finally, in riscvarchive#15 we are safe to poke the process registers because it will not have exited by this point. However, I did mention two bugs. The second bug I've not yet managed to actually trigger, but I'm convinced it must exist: if we forget vforks for a moment, in step riscvarchive#13 above, when linux_nat_target::detach is called, we first try to stop all threads in the process GDB is detaching from. If we imagine a multi-threaded inferior with many threads, and GDB running in non-stop mode, then, if the user tries to detach there is a chance that thread could exit just as linux_nat_target::detach is entered, in which case we should be able to trigger the same assert. But, like I said, I've not (yet) managed to trigger this second bug, and even if I could, the fix would not belong in this commit, so I'm pointing this out just for completeness. There's no test included in this commit. In a couple of commits time I will expand gdb.base/foll-vfork.exp which is when this bug would be exposed. Unfortunately there are at least two other bugs (separate from the ones discussed above) that need fixing first, these will be fixed in the next commits before the gdb.base/foll-vfork.exp test is expanded. If you do want to reproduce this failure then you will for certainly need to run the gdb.base/foll-vfork.exp test in a loop as the failures are all very timing sensitive. I've found that running multiple copies in parallel makes the failure more likely to appear, I usually run ~6 copies in parallel and expect to see a failure after within 10mins.

I decided to try to build and test gdb on Windows. I found a page on the wiki [1] suggesting three ways of building gdb: - MinGW, - MinGW on Cygwin, and - Cygwin. I picked Cygwin, because I've used it before (though not recently). I managed to install Cygwin and sufficient packages to build gdb and start the testsuite. However, testsuite progress ground to a halt at gdb.base/branch-to-self.exp. [ AFAICT, similar problems reported here [2]. ] I managed to reproduce this hang by running just the test-case. I attempted to kill the hanging processes by: - first killing the inferior process, using the cygwin "kill -9" command, and - then killing the gdb process, likewise. But the gdb process remained, and I had to point-and-click my way through task manager to actually kill the gdb process. I investigated this by attaching to the hanging gdb process. Looking at the main thread, I saw it was stopped in a call to WaitForSingleObject, with the dwMilliseconds parameter set to INFINITE. The backtrace in more detail: ... (gdb) bt #0 0x00007fff196fc044 in ntdll!ZwWaitForSingleObject () from /cygdrive/c/windows/SYSTEM32/ntdll.dll #1 0x00007fff16bbcdcf in WaitForSingleObjectEx () from /cygdrive/c/windows/System32/KERNELBASE.dll #2 0x0000000100998065 in wait_for_single (handle=0x1b8, howlong=4294967295) at gdb/windows-nat.c:435 #3 0x0000000100999aa7 in windows_nat_target::do_synchronously(gdb::function_view<bool ()>) (this=this@entry=0xa001c6fe0, func=...) at gdb/windows-nat.c:487 #4 0x000000010099a7fb in windows_nat_target::wait_for_debug_event_main_thread (event=<optimized out>, this=0xa001c6fe0) at gdb/../gdbsupport/function-view.h:296 #5 windows_nat_target::kill (this=0xa001c6fe0) at gdb/windows-nat.c:2917 #6 0x00000001008f2f86 in target_kill () at gdb/target.c:901 #7 0x000000010091fc46 in kill_or_detach (from_tty=0, inf=0xa000577d0) at gdb/top.c:1658 #8 quit_force (exit_arg=<optimized out>, from_tty=from_tty@entry=0) at gdb/top.c:1759 #9 0x00000001004f9ea8 in quit_command (args=args@entry=0x0, from_tty=from_tty@entry=0) at gdb/cli/cli-cmds.c:483 #10 0x000000010091c6d0 in quit_cover () at gdb/top.c:295 #11 0x00000001005e3d8a in async_disconnect (arg=<optimized out>) at gdb/event-top.c:1496 riscvarchive#12 0x0000000100499c45 in invoke_async_signal_handlers () at gdb/async-event.c:233 riscvarchive#13 0x0000000100eb23d6 in gdb_do_one_event (mstimeout=mstimeout@entry=-1) at gdbsupport/event-loop.cc:198 riscvarchive#14 0x00000001006df94a in interp::do_one_event (mstimeout=-1, this=<optimized out>) at gdb/interps.h:87 riscvarchive#15 start_event_loop () at gdb/main.c:402 riscvarchive#16 captured_command_loop () at gdb/main.c:466 riscvarchive#17 0x00000001006e2865 in captured_main (data=0x7ffffcba0) at gdb/main.c:1346 riscvarchive#18 gdb_main (args=args@entry=0x7ffffcc10) at gdb/main.c:1365 riscvarchive#19 0x0000000100f98c70 in main (argc=10, argv=0xa000129f0) at gdb/gdb.c:38 ... In the docs [3], I read that using an INFINITE argument to WaitForSingleObject might cause a system deadlock. This prompted me to try this simple change in wait_for_single: ... while (true) { - DWORD r = WaitForSingleObject (handle, howlong); + DWORD r = WaitForSingleObject (handle, + howlong == INFINITE ? 100 : howlong); + if (howlong == INFINITE && r == WAIT_TIMEOUT) + continue; ... with the timeout of 0.1 second estimated to be: - small enough for gdb to feel reactive, and - big enough not to consume too much cpu cycles with looping. And indeed, the test-case, while still failing, now finishes in ~50 seconds. While there may be an underlying bug that triggers this behaviour, the failure mode is so severe that I consider it a bug in itself. Fix this by avoiding calling WaitForSingleObject with INFINITE argument. Tested on x86_64-cygwin, by running the testsuite past the test-case. Approved-By: Pedro Alves <pedro@palves.net> PR tdep/32894 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=32894 [1] https://sourceware.org/gdb/wiki/BuildingOnWindows [2] https://sourceware.org/pipermail/gdb-patches/2025-May/217949.html [3] https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-waitforsingleobject

For background, see this thread: https://inbox.sourceware.org/gdb-patches/20250612144607.27507-1-tdevries@suse.de Tom describes the issue clearly in the above thread, here's what he said: Once in a while, when running test-case gdb.base/bp-cmds-continue-ctrl-c.exp, I run into: ... Breakpoint 2, foo () at bp-cmds-continue-ctrl-c.c:23^M 23 usleep (100);^M ^CFAIL: $exp: run: stop with control-c (unexpected) (timeout) FAIL: $exp: run: stop with control-c ... This is PR python/32167, observed both on x86_64-linux and powerpc64le-linux. This is not a timeout due to accidental slowness, gdb actually hangs. The backtrace at the hang is (on cfarm120 running AlmaLinux 9.6): ... (gdb) bt #0 0x00007fffbca9dd94 in __lll_lock_wait () from /lib64/glibc-hwcaps/power10/libc.so.6 #1 0x00007fffbcaa6ddc in pthread_mutex_lock@@GLIBC_2.17 () from /lib64/glibc-hwcaps/power10/libc.so.6 #2 0x000000001067aee8 in __gthread_mutex_lock () at /usr/include/c++/11/ppc64le-redhat-linux/bits/gthr-default.h:749 #3 0x000000001067afc8 in __gthread_recursive_mutex_lock () at /usr/include/c++/11/ppc64le-redhat-linux/bits/gthr-default.h:811 #4 0x000000001067b0d4 in std::recursive_mutex::lock () at /usr/include/c++/11/mutex:108 #5 0x000000001067b380 in std::lock_guard<std::recursive_mutex>::lock_guard () at /usr/include/c++/11/bits/std_mutex.h:229 #6 0x0000000010679d3c in set_quit_flag () at gdb/extension.c:865 #7 0x000000001066b6dc in handle_sigint () at gdb/event-top.c:1264 #8 0x00000000109e3b3c in handler_wrapper () at gdb/posix-hdep.c:70 #9 <signal handler called> #10 0x00007fffbcaa6d14 in pthread_mutex_lock@@GLIBC_2.17 () from /lib64/glibc-hwcaps/power10/libc.so.6 #11 0x000000001067aee8 in __gthread_mutex_lock () at /usr/include/c++/11/ppc64le-redhat-linux/bits/gthr-default.h:749 riscvarchive#12 0x000000001067afc8 in __gthread_recursive_mutex_lock () at /usr/include/c++/11/ppc64le-redhat-linux/bits/gthr-default.h:811 riscvarchive#13 0x000000001067b0d4 in std::recursive_mutex::lock () at /usr/include/c++/11/mutex:108 riscvarchive#14 0x000000001067b380 in std::lock_guard<std::recursive_mutex>::lock_guard () at /usr/include/c++/11/bits/std_mutex.h:229 riscvarchive#15 0x00000000106799cc in set_active_ext_lang () at gdb/extension.c:775 riscvarchive#16 0x0000000010b287ac in gdbpy_enter::gdbpy_enter () at gdb/python/python.c:232 riscvarchive#17 0x0000000010a8e3f8 in bpfinishpy_handle_stop () at gdb/python/py-finishbreakpoint.c:414 ... What happens here is the following: - the gdbpy_enter constructor attempts to set the current extension language to python using set_active_ext_lang - set_active_ext_lang attempts to lock ext_lang_mutex - while doing so, it is interrupted by sigint_wrapper (the SIGINT handler), handling a SIGINT - sigint_wrapper calls handle_sigint, which calls set_quit_flag, which also tries to lock ext_lang_mutex - since std::recursive_mutex::lock is not async-signal-safe, things go wrong, resulting in a hang. The hang bisects to commit 8bb8f83 ("Fix gdb.interrupt race"), which introduced the lock, making PR python/32167 a regression since gdb 15.1. Commit 8bb8f83 fixes PR dap/31263, a race reported by ThreadSanitizer: ... WARNING: ThreadSanitizer: data race (pid=615372) Read of size 1 at 0x00000328064c by thread T19: #0 set_active_ext_lang(extension_language_defn const*) gdb/extension.c:755 #1 scoped_disable_cooperative_sigint_handling::scoped_disable_cooperative_sigint_handling() gdb/extension.c:697 #2 gdbpy_interrupt gdb/python/python.c:1106 #3 cfunction_vectorcall_NOARGS <null> Previous write of size 1 at 0x00000328064c by main thread: #0 scoped_disable_cooperative_sigint_handling::scoped_disable_cooperative_sigint_handling() gdb/extension.c:704 #1 fetch_inferior_event() gdb/infrun.c:4591 ... Location is global 'cooperative_sigint_handling_disabled' of size 1 at 0x00000328064c ... SUMMARY: ThreadSanitizer: data race gdb/extension.c:755 in \ set_active_ext_lang(extension_language_defn const*) ... The problem here is that gdb.interrupt is called from a worker thread, and its implementation, gdbpy_interrupt races with the main thread on some variable. The fix presented here is based on the fix that Tom proposed, but fills in the missing Mingw support. The problem is basically split into two: hosts that support unix like signals, and Mingw, which doesn't support signals. For signal supporting hosts, I've adopted the approach that Tom suggests, gdbpy_interrupt uses kill() to send SIGINT to the GDB process. This is then handled in the main thread as if the user had pressed Ctrl+C. For these hosts no locking is required, so the existing lock is removed. However, everywhere the lock currently exists I've added an assert: gdb_assert (is_main_thread ()); If this assert ever triggers then we're setting or reading the quit flag on a worker thread, this will be a problem without the mutex. For Mingw, the current mutex is retained. This is fine as there are no signals, so no chance of the mutex acquisition being interrupted by a signal, and so, deadlock shouldn't be an issue. To manage the complexity of when we need an assert, and when we need the mutex, I've created 'struct ext_lang_guard', which can be used as a RAII object. This object either performs the assertion check, or acquires the mutex, depending on the host. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=32167 Co-Authored-By: Tom de Vries <tdevries@suse.de> Approved-By: Tom Tromey <tom@tromey.com>

While reviewing and testing another patch I set a breakpoint on an gnu ifunc function, then restarted the inferior, and this assert triggered: ../../src/gdb/breakpoint.c:14747: internal-error: breakpoint_free_objfile: Assertion `loc->symtab == nullptr' failed. The backtrace at the time of the assert is: #6 0x00000000005ffee0 in breakpoint_free_objfile (objfile=0x4064b30) at ../../src/gdb/breakpoint.c:14747 #7 0x0000000000c33ff2 in objfile::~objfile (this=0x4064b30, __in_chrg=<optimized out>) at ../../src/gdb/objfiles.c:478 #8 0x0000000000c38da6 in std::default_delete<objfile>::operator() (this=0x7ffc1a49d538, __ptr=0x4064b30) at /usr/include/c++/9/bits/unique_ptr.h:81 #9 0x0000000000c3782a in std::unique_ptr<objfile, std::default_delete<objfile> >::~unique_ptr (this=0x7ffc1a49d538, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/unique_ptr.h:292 #10 0x0000000000caf1bd in owning_intrusive_list<objfile, intrusive_base_node<objfile> >::erase (this=0x3790d68, i=...) at ../../src/gdb/../gdbsupport/owning_intrusive_list.h:111 #11 0x0000000000cacd0c in program_space::remove_objfile (this=0x3790c80, objfile=0x4064b30) at ../../src/gdb/progspace.c:192 riscvarchive#12 0x0000000000c33e1c in objfile::unlink (this=0x4064b30) at ../../src/gdb/objfiles.c:408 riscvarchive#13 0x0000000000c34fb9 in objfile_purge_solibs (pspace=0x3790c80) at ../../src/gdb/objfiles.c:729 riscvarchive#14 0x0000000000edf6f7 in no_shared_libraries (pspace=0x3790c80) at ../../src/gdb/solib.c:1359 riscvarchive#15 0x0000000000fb3f6c in target_pre_inferior () at ../../src/gdb/target.c:2466 riscvarchive#16 0x0000000000a724d7 in run_command_1 (args=0x0, from_tty=0, run_how=RUN_NORMAL) at ../../src/gdb/infcmd.c:390 riscvarchive#17 0x0000000000a72a97 in run_command (args=0x0, from_tty=0) at ../../src/gdb/infcmd.c:514 riscvarchive#18 0x00000000006bbb3d in do_simple_func (args=0x0, from_tty=0, c=0x39124b0) at ../../src/gdb/cli/cli-decode.c:95 riscvarchive#19 0x00000000006c1021 in cmd_func (cmd=0x39124b0, args=0x0, from_tty=0) at ../../src/gdb/cli/cli-decode.c:2827 The function breakpoint_free_objfile is being called when an objfile representing a shared library is being unloaded ahead of the inferior being restarted, the function is trying to remove references to anything that could itself reference the objfile that is being deleted. The assert is making the claim that, for a bp_location, which has a single address, the objfile of the symtab associated with the location will be the same as the objfile associated with the section of the location. This seems reasonable to me now, as it did when I added the assert in commit: commit 5066f36 Date: Mon Nov 11 21:45:17 2024 +0000 gdb: do better in breakpoint_free_objfile The bp_location::section is maintained, according to the comments in breakpoint.h, to aid overlay debugging (is that even used any more), and looking at the code, this does appear to be the case. The problem in the above case arises when we are dealing with an ifunc function. What happens is that we end up with a section from one objfile, but a symtab from a different objfile. This problem originates from minsym_found (in linespec.c). The user asked for 'break gnu_ifunc' where 'gnu_ifunc' is an ifunc function. What this means is that gnu_ifunc is actually a resolver function that returns the address of the actual function to use. In this particular test case, the resolver function is in a shared library, and the actual function to use is in the main executable. So, when GDB looks for 'gnu_ifunc' is finds the minimal_symbol with that name, and spots that this has type mst_text_gnu_ifunc. GDB then uses this to figure out the actual address of the function that will be run. GDB then creates the symtab_and_line using the _real_ address and the symtab in which that address lies, in our case this will all be related to the main executable objfile. But, finally, in minsym_found, GDB fills in the symtab_and_line's section field, and this is done using the section containing the original minimal_symbol, which is from the shared library objfile. The minimal symbol and section are then use to initialise the bp_location object, and this is how we end up in, what I think, is an unexpected state. So what to do about this? The symtab_and_line::msymbol field is _only_ set within minsym_found, and is then _only_ used to initialise the bp_location::msymbol field. The bp_location::msymbol field is _only_ used in the function set_breakpoint_location_function, and we only really care about the msymbol type, we check to see if it's an ifunc symbol or not. This allows us to set the name of the function correctly. The bp_location::section is used, as far as I can tell, extensively for overlay handling. It would seem to me, that this section should be the section containing the actual breakpoint address. If the question we're asking is, is this breakpoint mapped in or not? Then surely we need to ask about the section holding the breakpoint's address, and not the section holding some other code (e.g. the resolver function). In fact, in a memory constrained environment, you'd expect the resolver functions to get mapped out pretty early on, but while the actual functions might still be mapped in. Finally, symtab_and_line::section. This is mostly set using calls to find_pc_overlay. The minsym_found function is one of the few places where we do things differently. In the places where the section is used, it is (almost?) always used in conjunction with the symtab_and_line::pc to lookup information, e.g. calls to block_for_pc_sect, or find_pc_sect_containing_function. In all these cases, it appears to me that the assumption is that the section will be the section that contains the address. So, where does this leave us? I think what we need to do is update minsym_found to just use find_pc_overlay, which is how the symtab_and_line::section is set in most other cases. What this actually means in practise is that the section field will be set to NULL (see find_pc_overlay in symfile.c). But given that this is how the section is computed in most other cases, I don't see why it should be especially problematic for this case. In reality, I think this just means that the section is calculated via a call to find_pc_section when it's needed, as an example, see lookup_minimal_symbol_by_pc_section (minsyms.c). I do wonder if we should be doing better when creating the symtab_and_line, and insist that the section be calculated correctly at that point, but I really don't want to open that can of worms right now, so I think just changing minsym_found to "do it just like everyone else" should be good enough. I've extended the existing ifunc test to expose this issue, the updated test fails without this patch, and passes with. Approved-By: Simon Marchi <simon.marchi@efficios.com>

A bug was reported to Red Hat where GDB was crashing with an assertion failure, the assertion message is: ../../gdb/regcache.c:432: internal-error: get_thread_regcache: Assertion `thread->state != THREAD_EXITED' failed. The backtrace for the crash is: #5 0x000055a21da8a880 in internal_vproblem(internal_problem *, const char *, int, const char *, typedef __va_list_tag __va_list_tag *) (problem=problem@entry=0x55a21e289060 <internal_error_problem>, file=<optimized out>, line=<optimized out>, fmt=<optimized out>, ap=ap@entry=0x7ffec7576be0) at ../../gdb/utils.c:477 #6 0x000055a21da8aadf in internal_verror (file=<optimized out>, line=<optimized out>, fmt=<optimized out>, ap=ap@entry=0x7ffec7576be0) at ../../gdb/utils.c:503 #7 0x000055a21dcbd055 in internal_error_loc (file=file@entry=0x55a21dd33b71 "../../gdb/regcache.c", line=line@entry=432, fmt=<optimized out>) at ../../gdbsupport/errors.cc:57 #8 0x000055a21d8baaa9 in get_thread_regcache (thread=thread@entry=0x55a258de3a50) at ../../gdb/regcache.c:432 #9 0x000055a21d74fa18 in print_signal_received_reason (uiout=0x55a258b649b0, siggnal=GDB_SIGNAL_TRAP) at ../../gdb/infrun.c:9287 #10 0x000055a21d7daad9 in mi_interp::on_signal_received (this=0x55a258af5f60, siggnal=GDB_SIGNAL_TRAP) at ../../gdb/mi/mi-interp.c:372 #11 0x000055a21d76ef99 in interps_notify<void (interp::*)(gdb_signal), gdb_signal&> (method=&virtual table offset 88, this adjustment 974682) at ../../gdb/interps.c:369 riscvarchive#12 0x000055a21d76e58f in interps_notify_signal_received (sig=<optimized out>, sig@entry=GDB_SIGNAL_TRAP) at ../../gdb/interps.c:378 riscvarchive#13 0x000055a21d75074d in notify_signal_received (sig=GDB_SIGNAL_TRAP) at ../../gdb/infrun.c:6818 riscvarchive#14 0x000055a21d755af0 in normal_stop () at ../../gdb/gdbthread.h:432 riscvarchive#15 0x000055a21d768331 in fetch_inferior_event () at ../../gdb/infrun.c:4753 The user is using a build of GDB with 32-bit ARM support included, and they gave the following description for what they were doing at the time of the crash: Suspended the execution of the firmware in Eclipse. The gdb was connected to JLinkGDBServer with activated FreeRTOS awareness JLink plugin. So they are remote debugging with a non-gdbserver target. Looking in normal_stop() we see this code: /* As we're presenting a stop, and potentially removing breakpoints, update the thread list so we can tell whether there are threads running on the target. With target remote, for example, we can only learn about new threads when we explicitly update the thread list. Do this before notifying the interpreters about signal stops, end of stepping ranges, etc., so that the "new thread" output is emitted before e.g., "Program received signal FOO", instead of after. */ update_thread_list (); if (last.kind () == TARGET_WAITKIND_STOPPED && stopped_by_random_signal) notify_signal_received (inferior_thread ()->stop_signal ()); Which accounts for the transition from frame riscvarchive#14 to frame riscvarchive#13. But it is the update_thread_list() call which interests me. This call asks the target (remote target in this case) for the current thread list, and then marks threads exited based on the answer. And so, if a (badly behaved) target (incorrectly) removes a thread from the thread list, then the update_thread_list() call will mark the impacted thread as exited, even if GDB is currently handling a signal stop event for that target. My guess for what's going on here then is this: 1. Thread receives a signal. 2. Remote target sends GDB a stop with signal packet. 3. Remote decides that the thread is going away soon, and marks the thread as exited. 4. GDB asks for the thread list. 5. Remote sends back the thread list, which doesn't include the event thread, as the remote things this thread has exited. 6. GDB marks the thread as exited, and then proceeds to try and print the signal stop event for the event thread. 7. Printing the signal stop requires reading registers, which requires a regache. We can only get a regcache for a non-exited thread, and so GDB raises an assertion. Using the gdbreplay test frame work I was able to reproduce this failure using gdbserver. I create an inferior with two threads, the main thread sends a signal to the second thread, GDB sees the signal arrive and prints this information for the user. Having captured the trace of this activity, I then find the thread list reply in the log file, and modify it to remove the second thread. Now, when I replay the modified log file I see the same assertion complaining about an attempt to get a regcache for an exited thread. I'm not entirely sure the best way to fix this. Clearly the problem here is a bad remote target. But, replies from a remote target should (in my opinion) not be considered trusted, as a consequence, we should not be asserting based on data coming from a remote. Instead, we should be giving warnings or errors and have GDB handle the bad data as best it can. This is the second attempt to fix this issue, my first patch can be seen here: https://inbox.sourceware.org/gdb-patches/062e438c8677e2ab28fac6183d2ea6d444cb9121.1747567717.git.aburgess@redhat.com In the first patch I was to checking in normal_stop, immediately after the call to update_thread_list, to see if the current thread was now marked as exited. However CI testing showed an issue with this approach; I was already checking for many different TARGET_WAITKIND_* kinds where the "is the current thread exited" question didn't make sense, and it turns out that the list of kinds in my first attempt was already insufficient. Rather than trying to just adding to the list, in this revised patch I'm proposing to move the "is this thread exited" check inside the block which handles signal stop events. Right now, the only part of normal_stop which I know relies on the current thread not being exited is the call to notify_signal_received, so before calling notify_signal_received I check to see if the current thread is now exited. If it is then I print a warning to indicate that the thread has unexpectedly exited and that the current command (continue/step/etc) has been cancelled, I then change the current event type to TARGET_WAITKIND_SPURIOUS. GDB's output now looks like this in all-stop mode: (gdb) continue Continuing. [New Thread 3483690.3483693] [Thread 3483690.3483693 exited] warning: Thread 3483690.3483693 unexpectedly exited after non-exit event [Switching to Thread 3483690.3483693] (gdb) The non-stop output is identical, except we don't switch thread (stop events never trigger a thread switch in non-stop mode). The include test makes use of the gdbreplay framework, and tests in all-stop and non-stop modes. I would like to do more extensive testing of GDB's state after the receiving the unexpected thread list, but due to using gdbreplay for testing, this is quite hard. Many commands, especially those looking at thread state, are likely to trigger additional packets being sent to the remote, which causes gdbreplay to bail out as the new packet doesn't match the original recorded state. However, I really don't think it is a good idea to change gdbserver in order to "fake" this error case, so for now, using gdbreplay is the best idea I have. Bug: https://bugzilla.redhat.com/show_bug.cgi?id=2366461

If an expression is evaluated with 'EVAL_AVOID_SIDE_EFFECTS', we're essentially interested in compatibility of the operands. If there is an operand of reference type, this would give us a memory value that would cause a failure if GDB attempts to access the contents. GDB fails to evaluate binary expressions for the following example: struct { int &get () { return x; }; int x = 1; } v_struct; The GDB output is: (gdb) print v_struct3.get () == 1 && v_struct3.get () == 2 Cannot access memory at address 0x0 (gdb) print v_struct3.get () == 1 || v_struct3.get () == 2 Cannot access memory at address 0x0 Likewise, GDB fails to resolve the type for some expressions: (gdb) ptype v_struct.get () type = int & (gdb) ptype v_struct.get () == 1 Cannot access memory at address 0x0 (gdb) ptype v_struct.get () + 1 Cannot access memory at address 0x0 (gdb) ptype v_struct.get () && 1 Cannot access memory at address 0x0 (gdb) ptype v_struct.get () || 1 Cannot access memory at address 0x0 (gdb) ptype !v_struct.get () Cannot access memory at address 0x0 (gdb) ptype v_struct.get () ? 2 : 3 Cannot access memory at address 0x0 (gdb) ptype v_struct.get () | 1 Cannot access memory at address 0x0 Expression evaluation uses helper functions such as 'value_equal', 'value_logical_not', etc. These helper functions do not take a 'noside' argument and if one of their value arguments was created from a function call that returns a reference type when noside == EVAL_AVOID_SIDE_EFFECTS, GDB attempts to read from an invalid memory location. Consider the following call stack of the 'ptype v_struct.get () + 1' command at the time of assertion when the memory error is raised: #0 memory_error (err=TARGET_XFER_E_IO, memaddr=0) at gdb/corefile.c:114 #1 read_value_memory (val=.., bit_offset=0, stack=false, memaddr=0, buffer=.. "", length=4) at gdb/valops.c:1075 #2 value::fetch_lazy_memory (this=..) at gdb/value.c:3996 #3 value::fetch_lazy (this=..) at gdb/value.c:4135 #4 value::contents_writeable (this=..) at gdb/value.c:1329 #5 value::contents (this=..) at gdb/value.c:1319 #6 value_as_mpz (val=..) at gdb/value.c:2685 #7 scalar_binop (arg1=.., arg2=.., op=BINOP_ADD) at gdb/valarith.c:1240 #8 value_binop (arg1=.., arg2=.., op=BINOP_ADD) at gdb/valarith.c:1489 #9 eval_op_add (expect_type=0x0, exp=.., noside=EVAL_AVOID_SIDE_EFFECTS, arg1=.., arg2=..) at gdb/eval.c:1333 #10 expr::add_operation::evaluate (this=.., expect_type=0x0, exp=.., noside=EVAL_AVOID_SIDE_EFFECTS) at gdb/expop.h:1209 #11 expression::evaluate (this=.., expect_type=0x0, noside=EVAL_AVOID_SIDE_EFFECTS) at gdb/eval.c:110 riscvarchive#12 expression::evaluate_type (this=..) at gdb/expression.h:242 'add_operation::evaluate' calls the helper 'eval_op_add' which attempts to read from the unresolved memory location. Convert to the target type to avoid such problems. The patch is implemented in 'expop.h' for the following reasons: * Support templated classes without explicit helpers, e.g., 'binop_operation' and 'comparison_operation'. * Stripping references in 'binop_promote' requires additional refactoring beyond this patch as we would need to carry on the 'noside' parameter. The above failures are resolved with the patch: (gdb) print v_struct.get () == 1 && v_struct3.get () == 2 $1 = false (gdb) print v_struct.get () == 1 || v_struct3.get () == 2 $2 = true (gdb) ptype v_struct.get () type = int & (gdb) ptype v_struct.get () == 1 type = bool (gdb) ptype v_struct.get () + 1 type = int (gdb) ptype v_struct.get () && 1 type = bool (gdb) ptype v_struct.get () || 1 type = bool (gdb) ptype !v_struct.get () type = bool (gdb) ptype v_struct.get () ? 2 : 3 type = int (gdb) ptype v_struct.get () | 1 type = int Co-Authored-By: Tankut Baris Aktemur <tankut.baris.aktemur@intel.com> Approved-By: Tom Tromey <tom@tromey.com>

PR gdb/33512 reports an assertion failure in test-case gdb.ada/access_to_packed_array.exp on i386-linux: ... (gdb) maint print symbols gdb/frame.c:3400: internal-error: reinflate: \ Assertion `m_cached_level >= -1' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. Quit this debugging session? (y or n) FAIL: $exp: \ maint print symbols (GDB internal error) ... I haven't been able to reproduce the failure by running the test-case on x86_64-linux with target board unix/-m32, but I'm able to reproduce on x86_64-linux by using the exec attached to the PR: ... $ cat gdb.in file foo maint expand-symtabs maint print symbols $ gdb -q -batch -ex "set trace-commands on" -x gdb.in ... c_to: array (gdb/frame.c:3395: internal-error: reinflate: \ Assertion `m_cached_level >= -1' failed. ... The problem happens when trying to print variable c_to: ... <4><f227>: Abbrev Number: 3 (DW_TAG_variable) <f228> DW_AT_name : c_to <f230> DW_AT_type : <0xf214> ... with type: ... <4><f214>: Abbrev Number: 7 (DW_TAG_array_type) <f215> DW_AT_type : <0x9f39> <5><f21d>: Abbrev Number: 12 (DW_TAG_subrange_type) <f21e> DW_AT_type : <0x9d6c> <f222> DW_AT_upper_bound : <0xf209> ... with upper bound: ... <4><f209>: Abbrev Number: 89 (DW_TAG_variable) <f20a> DW_AT_name : system__os_lib__copy_file__copy_to__TTc_toSP1___U <f20e> DW_AT_type : <0x9d6c> <f212> DW_AT_artificial : 1 <f212> DW_AT_location : 1 byte block: 57 (DW_OP_reg7 (edi)) ... The backtrace at the point of the assertion failure is: ... (gdb) bt #0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44 #1 0x00007ffff62a8e7f in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78 #2 0x00007ffff6257842 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 #3 0x00007ffff623f5cf in __GI_abort () at abort.c:79 #4 0x00000000010e7ac6 in dump_core () at gdb/utils.c:223 #5 0x00000000010e81b8 in internal_vproblem(internal_problem *, const char *, int, const char *, typedef __va_list_tag __va_list_tag *) ( problem=0x2ceb0c0 <internal_error_problem>, file=0x1ad5a90 "gdb/frame.c", line=3395, fmt=0x1ad5a08 "%s: Assertion `%s' failed.", ap=0x7fffffffc3c0) at gdb/utils.c:475 #6 0x00000000010e82ac in internal_verror ( file=0x1ad5a90 "gdb/frame.c", line=3395, fmt=0x1ad5a08 "%s: Assertion `%s' failed.", ap=0x7fffffffc3c0) at gdb/utils.c:501 #7 0x00000000019be79f in internal_error_loc ( file=0x1ad5a90 "gdb/frame.c", line=3395, fmt=0x1ad5a08 "%s: Assertion `%s' failed.") at gdbsupport/errors.cc:57 #8 0x00000000009b5c16 in frame_info_ptr::reinflate (this=0x7fffffffc878) at gdb/frame.c:3395 #9 0x00000000009b66f9 in frame_info_ptr::operator-> (this=0x7fffffffc878) at gdb/frame.h:290 #10 0x00000000009b4bd5 in get_frame_arch (this_frame=...) at gdb/frame.c:3075 #11 0x000000000081dd89 in dwarf_expr_context::fetch_result ( this=0x7fffffffc810, type=0x410d600, subobj_type=0x410d600, subobj_offset=0, as_lval=true) at gdb/dwarf2/expr.c:1006 riscvarchive#12 0x000000000081e2ef in dwarf_expr_context::evaluate (this=0x7fffffffc810, addr=0x7ffff459ce6b "W\aF\003", len=1, as_lval=true, per_cu=0x7fffd00053f0, frame=..., addr_info=0x7fffffffcc30, type=0x0, subobj_type=0x0, subobj_offset=0) at gdb/dwarf2/expr.c:1136 riscvarchive#13 0x0000000000877c14 in dwarf2_locexpr_baton_eval (dlbaton=0x3e99c18, frame=..., addr_stack=0x7fffffffcc30, valp=0x7fffffffcab0, push_values=..., is_reference=0x7fffffffc9b0) at gdb/dwarf2/loc.c:1604 riscvarchive#14 0x0000000000877f71 in dwarf2_evaluate_property (prop=0x3e99ce0, initial_frame=..., addr_stack=0x7fffffffcc30, value=0x7fffffffcab0, push_values=...) at gdb/dwarf2/loc.c:1668 riscvarchive#15 0x00000000009def76 in resolve_dynamic_range (dyn_range_type=0x3e99c50, addr_stack=0x7fffffffcc30, frame=..., rank=0, resolve_p=true) at gdb/gdbtypes.c:2198 riscvarchive#16 0x00000000009e0ded in resolve_dynamic_type_internal (type=0x3e99c50, addr_stack=0x7fffffffcc30, frame=..., top_level=true) at gdb/gdbtypes.c:2934 riscvarchive#17 0x00000000009e1079 in resolve_dynamic_type (type=0x3e99c50, valaddr=..., addr=0, in_frame=0x0) at gdb/gdbtypes.c:2989 riscvarchive#18 0x0000000000488ebc in ada_discrete_type_low_bound (type=0x3e99c50) at gdb/ada-lang.c:710 riscvarchive#19 0x00000000004eb734 in print_range (type=0x3e99c50, stream=0x30157b0, bounds_preferred_p=0) at gdb/ada-typeprint.c:156 riscvarchive#20 0x00000000004ebffe in print_array_type (type=0x3e99d10, stream=0x30157b0, show=1, level=9, flags=0x1bdcf20 <type_print_raw_options>) at gdb/ada-typeprint.c:381 riscvarchive#21 0x00000000004eda3c in ada_print_type (type0=0x3e99d10, varstring=0x401f710 "c_to", stream=0x30157b0, show=1, level=9, flags=0x1bdcf20 <type_print_raw_options>) at gdb/ada-typeprint.c:1015 riscvarchive#22 0x00000000004b4627 in ada_language::print_type ( this=0x2f949b0 <ada_language_defn>, type=0x3e99d10, varstring=0x401f710 "c_to", stream=0x30157b0, show=1, level=9, flags=0x1bdcf20 <type_print_raw_options>) at gdb/ada-lang.c:13681 riscvarchive#23 0x0000000000f74646 in print_symbol (gdbarch=0x3256270, symbol=0x3e99db0, depth=9, outfile=0x30157b0) at gdb/symmisc.c:545 riscvarchive#24 0x0000000000f737e6 in dump_symtab_1 (symtab=0x3ddd7e0, outfile=0x30157b0) at gdb/symmisc.c:313 riscvarchive#25 0x0000000000f73a69 in dump_symtab (symtab=0x3ddd7e0, outfile=0x30157b0) at gdb/symmisc.c:370 riscvarchive#26 0x0000000000f7420f in maintenance_print_symbols (args=0x0, from_tty=0) at gdb/symmisc.c:481 riscvarchive#27 0x00000000006c7fde in do_simple_func (args=0x0, from_tty=0, c=0x321e270) at gdb/cli/cli-decode.c:94 riscvarchive#28 0x00000000006ce65a in cmd_func (cmd=0x321e270, args=0x0, from_tty=0) at gdb/cli/cli-decode.c:2826 riscvarchive#29 0x0000000001005b78 in execute_command (p=0x3f48fe3 "", from_tty=0) at gdb/top.c:564 riscvarchive#30 0x0000000000966095 in command_handler ( command=0x3f48fd0 "maint print symbols") at gdb/event-top.c:613 riscvarchive#31 0x0000000001005141 in read_command_file (stream=0x3011a40) at gdb/top.c:333 riscvarchive#32 0x00000000006e2a64 in script_from_file (stream=0x3011a40, file=0x7fffffffe21f "gdb.in") at gdb/cli/cli-script.c:1705 riscvarchive#33 0x00000000006bb88c in source_script_from_stream (stream=0x3011a40, file=0x7fffffffe21f "gdb.in", file_to_open=0x7fffffffd760 "gdb.in") at gdb/cli/cli-cmds.c:706 riscvarchive#34 0x00000000006bba12 in source_script_with_search ( file=0x7fffffffe21f "gdb.in", from_tty=0, search_path=0) at gdb/cli/cli-cmds.c:751 riscvarchive#35 0x00000000006bbab2 in source_script (file=0x7fffffffe21f "gdb.in", from_tty=0) at gdb/cli/cli-cmds.c:760 riscvarchive#36 0x0000000000b835cb in catch_command_errors ( command=0x6bba7e <source_script(char const*, int)>, arg=0x7fffffffe21f "gdb.in", from_tty=0, do_bp_actions=false) at gdb/main.c:510 riscvarchive#37 0x0000000000b83803 in execute_cmdargs (cmdarg_vec=0x7fffffffd980, file_type=CMDARG_FILE, cmd_type=CMDARG_COMMAND, ret=0x7fffffffd8c8) at gdb/main.c:606 riscvarchive#38 0x0000000000b84d79 in captured_main_1 (context=0x7fffffffdb90) at gdb/main.c:1349 riscvarchive#39 0x0000000000b84fe4 in captured_main (context=0x7fffffffdb90) at gdb/main.c:1372 riscvarchive#40 0x0000000000b85092 in gdb_main (args=0x7fffffffdb90) at gdb/main.c:1401 riscvarchive#41 0x000000000041a382 in main (argc=9, argv=0x7fffffffdcc8) at gdb/gdb.c:38 (gdb) ... The immediate problem is in dwarf_expr_context::fetch_result where we're calling get_frame_arch: ... switch (this->m_location) { case DWARF_VALUE_REGISTER: { gdbarch *f_arch = get_frame_arch (this->m_frame); ... with a null frame: ... (gdb) p this->m_frame.is_null () $1 = true (gdb) ... Fix this using ensure_have_frame in dwarf_expr_context::execute_stack_op for DW_OP_reg<n> and DW_OP_regx, getting us instead: ... c_to: array (<>) of character; computed at runtime ... Tested on x86_64-linux. Approved-By: Tom Tromey <tom@tromey.com> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=33512

On ppc64le-linux (AlmaLinux 9.6) with python 3.9 and test-case gdb.python/py-failed-init.exp I run into: ... builtin_spawn $gdb -nw -nx -q -iex set height 0 -iex set width 0 \ -data-directory $build/gdb/data-directory -iex set interactive-mode on^M Python path configuration:^M PYTHONHOME = 'foo'^M PYTHONPATH = (not set)^M program name = '/usr/bin/python'^M isolated = 0^M environment = 1^M user site = 1^M import site = 1^M sys._base_executable = '/usr/bin/python'^M sys.base_prefix = 'foo'^M sys.base_exec_prefix = 'foo'^M sys.platlibdir = 'lib64'^M sys.executable = '/usr/bin/python'^M sys.prefix = 'foo'^M sys.exec_prefix = 'foo'^M sys.path = [^M 'foo/lib64/python39.zip',^M 'foo/lib64/python3.9',^M 'foo/lib64/python3.9/lib-dynload',^M ]^M Fatal Python error: init_fs_encoding: failed to get the Python codec of the \ filesystem encoding^M Python runtime state: core initialized^M ModuleNotFoundError: No module named 'encodings'^M ^M Current thread 0x00007fffabe18480 (most recent call first):^M <no Python frame>^M ERROR: (eof) GDB never initialized. Couldn't send python print (1) to GDB. UNRESOLVED: gdb.python/py-failed-init.exp: gdb-command<python print (1)> Couldn't send quit to GDB. UNRESOLVED: gdb.python/py-failed-init.exp: quit ... The test-case expects gdb to present a prompt, but instead gdb calls exit with this back trace: ... (gdb) bt #0 0x00007ffff6e4bfbc in exit () from /lib64/glibc-hwcaps/power10/libc.so.6 #1 0x00007ffff7873fc4 in fatal_error.lto_priv () from /lib64/libpython3.9.so.1.0 #2 0x00007ffff78aae60 in Py_ExitStatusException () from /lib64/libpython3.9.so.1.0 #3 0x00007ffff78c0e58 in Py_InitializeEx () from /lib64/libpython3.9.so.1.0 #4 0x0000000010b6cab4 in py_initialize_catch_abort () at gdb/python/python.c:2456 #5 0x0000000010b6cfac in py_initialize () at gdb/python/python.c:2540 #6 0x0000000010b6d104 in do_start_initialization () at gdb/python/python.c:2595 #7 0x0000000010b6eaac in gdbpy_initialize (extlang=0x11b7baf0 <extension_language_python>) at gdb/python/python.c:2968 #8 0x000000001069d508 in ext_lang_initialization () at gdb/extension.c:319 #9 0x00000000108f9280 in captured_main_1 (context=0x7fffffffe870) at gdb/main.c:1100 #10 0x00000000108fa3cc in captured_main (context=0x7fffffffe870) at gdb/main.c:1372 #11 0x00000000108fa4d8 in gdb_main (args=0x7fffffffe870) at gdb/main.c:1401 riscvarchive#12 0x000000001001d1d8 in main (argc=3, argv=0x7fffffffece8) at gdb/gdb.c:38 ... This may be a python issue [1]. The problem doesn't happen if we use the PyConfig approach instead of the py_initialize_catch_abort approach. Fix this by using the PyConfig approach starting 3.9 (previously, starting 3.10 to avoid Py_SetProgramName deprecation in 3.11). It's possible that we have the same problem and need the same fix for 3.8, but I don't have a setup to check that. Add a todo in a comment. Tested on ppc64le-linux. Approved-By: Tom Tromey <tom@tromey.com> [1] python/cpython#107827

New in v2: make the test try with indexes by default This patch fixes a crash caused by GDB trying to read from a section not read in. The bug happens in those specific circumstances: - reading a type unit from .dwo - that type unit has a stub in the main file - there is a GDB index (.gdb_index) present This crash is the cause of the following test failure, with the cc-with-gdb-index target board: $ make check TESTS="gdb.dwarf2/fission-reread.exp" RUNTESTFLAGS="--target_board=cc-with-gdb-index" Running /home/smarchi/src/binutils-gdb/gdb/testsuite/gdb.dwarf2/fission-reread.exp ... ERROR: GDB process no longer exists Or, manually: $ ./gdb -nx -q --data-directory=data-directory /home/smarchi/build/binutils-gdb/gdb/testsuite/outputs/gdb.dwarf2/fission-reread/fission-reread -ex "p 1" Reading symbols from /home/smarchi/build/binutils-gdb/gdb/testsuite/outputs/gdb.dwarf2/fission-reread/fission-reread... Fatal signal: Segmentation fault For this last one, you need to interrupt the test (e.g. add a return) before the test deletes the .dwo file. The backtrace at the moment of the crash is: #0 0x0000555566968f7f in bfd_getl32 (p=0x0) at /home/simark/src/binutils-gdb/bfd/libbfd.c:846 #1 0x00005555642e561d in read_initial_length (abfd=0x7d1ff1eb0e40, buf=0x0, bytes_read=0x7bfff0962fa0, handle_nonstd=true) at /home/simark/src/binutils-gdb/gdb/dwarf2/leb.c:92 #2 0x00005555647ca9ea in read_unit_head (header=0x7d0ff1e068b0, info_ptr=0x0, section=0x7c3ff1dea7d0, section_kind=ruh_kind::COMPILE) at /home/simark/src/binutils-gdb/gdb/dwarf2/unit-head.c:44 #3 0x000055556452e37e in dwarf2_per_cu::get_header (this=0x7d0ff1e06880) at /home/simark/src/binutils-gdb/gdb/dwarf2/read.c:18531 #4 0x000055556452e574 in dwarf2_per_cu::addr_size (this=0x7d0ff1e06880) at /home/simark/src/binutils-gdb/gdb/dwarf2/read.c:18544 #5 0x000055556406af91 in dwarf2_cu::addr_type (this=0x7d7ff1e20880) at /home/simark/src/binutils-gdb/gdb/dwarf2/cu.c:124 #6 0x0000555564534e48 in set_die_type (die=0x7e0ff1f23dd0, type=0x7e0ff1f027f0, cu=0x7d7ff1e20880, skip_data_location=false) at /home/simark/src/binutils-gdb/gdb/dwarf2/read.c:19020 #7 0x00005555644dcc7b in read_structure_type (die=0x7e0ff1f23dd0, cu=0x7d7ff1e20880) at /home/simark/src/binutils-gdb/gdb/dwarf2/read.c:11239 #8 0x000055556451c834 in read_type_die_1 (die=0x7e0ff1f23dd0, cu=0x7d7ff1e20880) at /home/simark/src/binutils-gdb/gdb/dwarf2/read.c:16878 #9 0x000055556451c5e0 in read_type_die (die=0x7e0ff1f23dd0, cu=0x7d7ff1e20880) at /home/simark/src/binutils-gdb/gdb/dwarf2/read.c:16861 #10 0x0000555564526f3a in get_signatured_type (die=0x7e0ff1f0ffb0, signature=10386129560629316377, cu=0x7d7ff1e0f480) at /home/simark/src/binutils-gdb/gdb/dwarf2/read.c:17998 #11 0x000055556451c23b in lookup_die_type (die=0x7e0ff1f0ffb0, attr=0x7e0ff1f10008, cu=0x7d7ff1e0f480) at /home/simark/src/binutils-gdb/gdb/dwarf2/read.c:16808 riscvarchive#12 0x000055556451b2e9 in die_type (die=0x7e0ff1f0ffb0, cu=0x7d7ff1e0f480) at /home/simark/src/binutils-gdb/gdb/dwarf2/read.c:16684 riscvarchive#13 0x000055556451457f in new_symbol (die=0x7e0ff1f0ffb0, type=0x0, cu=0x7d7ff1e0f480, space=0x0) at /home/simark/src/binutils-gdb/gdb/dwarf2/read.c:16089 riscvarchive#14 0x00005555644c52a4 in read_variable (die=0x7e0ff1f0ffb0, cu=0x7d7ff1e0f480) at /home/simark/src/binutils-gdb/gdb/dwarf2/read.c:9119 riscvarchive#15 0x0000555564494072 in process_die (die=0x7e0ff1f0ffb0, cu=0x7d7ff1e0f480) at /home/simark/src/binutils-gdb/gdb/dwarf2/read.c:5197 riscvarchive#16 0x000055556449c88e in read_file_scope (die=0x7e0ff1f0fdd0, cu=0x7d7ff1e0f480) at /home/simark/src/binutils-gdb/gdb/dwarf2/read.c:6125 riscvarchive#17 0x0000555564493671 in process_die (die=0x7e0ff1f0fdd0, cu=0x7d7ff1e0f480) at /home/simark/src/binutils-gdb/gdb/dwarf2/read.c:5098 riscvarchive#18 0x00005555644912f5 in process_full_comp_unit (cu=0x7d7ff1e0f480) at /home/simark/src/binutils-gdb/gdb/dwarf2/read.c:4851 riscvarchive#19 0x0000555564485e18 in process_queue (per_objfile=0x7d6ff1e71100) at /home/simark/src/binutils-gdb/gdb/dwarf2/read.c:4161 riscvarchive#20 0x000055556446391d in dw2_do_instantiate_symtab (per_cu=0x7ceff1de42d0, per_objfile=0x7d6ff1e71100, skip_partial=true) at /home/simark/src/binutils-gdb/gdb/dwarf2/read.c:1650 riscvarchive#21 0x0000555564463b3c in dw2_instantiate_symtab (per_cu=0x7ceff1de42d0, per_objfile=0x7d6ff1e71100, skip_partial=true) at /home/simark/src/binutils-gdb/gdb/dwarf2/read.c:1671 riscvarchive#22 0x00005555644687fd in dwarf2_base_index_functions::expand_all_symtabs (this=0x7c1ff1e04990, objfile=0x7d5ff1e46080) at /home/simark/src/binutils-gdb/gdb/dwarf2/read.c:1990 riscvarchive#23 0x0000555564381050 in cooked_index_functions::expand_all_symtabs (this=0x7c1ff1e04990, objfile=0x7d5ff1e46080) at /home/simark/src/binutils-gdb/gdb/dwarf2/cooked-index.h:237 riscvarchive#24 0x0000555565df5b0d in objfile::expand_all_symtabs (this=0x7d5ff1e46080) at /home/simark/src/binutils-gdb/gdb/symfile-debug.c:372 riscvarchive#25 0x0000555565eafc4a in maintenance_expand_symtabs (args=0x0, from_tty=1) at /home/simark/src/binutils-gdb/gdb/symmisc.c:914 The main file contains a stub (skeleton) for a compilation unit and a stub for a type unit. The .dwo file contains a compilation unit and a type unit matching those stubs. When doing the initial scan of the main file, the DWARF reader parses the CU/TU list from the GDB index (.gdb_index), and thus creates a signatured_type object based on that. The section field of this signatured_type points to the .debug_types section in the main file, the one containing the stub. And because GDB trusts the GDB index, it never needs to look at that .debug_types section in the main file. That section remains not read in. When expanding the compilation unit, GDB encounters a type unit reference (by signature) corresponding to the type in the type unit. We get in lookup_dwo_signatured_type, trying to see if there is a type unit matching that signature in the current .dwo file. We proceed to read and expand that type unit, until we eventually get to a dwarf2_cu::addr_type() call, which does: int addr_size = this->per_cu->addr_size (); dwarf2_per_cu::addr_size() tries to read the header from the section pointed to by dwarf2_per_cu::section which, if you recall, is the .debug_types section in the main file that was never read in. That causes the segfault. All this was working fine before these patches of mine, that tried to do some cleanups: a47e229 ("gdb/dwarf: pass section offset to dwarf2_per_cu_data constructor") c44ab62 ("gdb/dwarf: pass section to dwarf2_per_cu_data constructor") 39ee8c9 ("gdb/dwarf: pass unit length to dwarf2_per_cu_data constructor") Before these patches, the fill_in_sig_entry_from_dwo_entry function (called from lookup_dwo_signatured_type, among others) would overwrite some dwarf2_per_cu fields (including the section) to point to the .dwo, rather than represent what's in the main file. Therefore, the header would have been read from the unit in the .dwo file, and things would have been fine. When doing these changes, I mistakenly assumed that the section written by fill_in_sig_entry_from_dwo_entry was the same as the section already there, which is why I removed the statements overwriting the section field (and the two others). To my defense, here's the comment on dwarf2_per_cu::section: /* The section this CU/TU lives in. If the DIE refers to a DWO file, this is always the original die, not the DWO file. */ struct dwarf2_section_info *section = nullptr; I would prefer to not reintroduce the behavior of overwriting the section info in dwarf2_per_cu, because: 1. I find it confusing, I like the invariant of dwarf2_per_cu::section points to the stub, and dwarf2_cu::section points to where we actually read the debug info from. 2. The dwarf2_per_bfd::all_units vector is nowadays sorted by (section, section offset). If we change the section and section offset of a dwarf2_per_cu, then we can no longer do binary searches in it, we would have to re-sort the vector (not a big deal, but still adds to the confusion). One possible fix would be to make sure that the section is read in when reading the header, probably in dwarf2_per_cu::get_header. An approach like that was proposed by Andrew initially, here: https://inbox.sourceware.org/gdb-patches/60ba2b019930fd6164f8e6ab6cb2e396c32c6ac2.1759486109.git.aburgess@redhat.com/ It would work, but there is a more straightforward fix for this particular problem, I believe. In dwarf2_cu, we have access to the header read from the unit we're actually reading the DWARF from. In the DWO case, that is the header read from the .dwo file. We can get the address size from there instead of going through the dwarf2_per_cu object. This is what this patch does. However, there are other case where we get the address (or offset) size from the dwarf2_per_cu in the DWARF expression evaluator (expr.c, loc.c), that could cause a similar crash. The next patch handles these cases. Modify the gdb.dwarf2/fission-reread.exp test so that it tries running with an index even with the standard board (that part was originally written by Andrew). Finally, just to put things in context, having a stub in the main file for a type unit is obsolete. It happened in the gcc 4.x days, until this commit: commit 4dd7c3b285daf030da0ff9c0d5e2f79b24943d1e Author: Cary Coutant <ccoutant@google.com> Date: Fri Aug 8 20:33:26 2014 +0000 Remove skeleton type units that were being produced with -gsplit-dwarf. In DWARF 5, split type units don't have stubs, only split compilations units do. Change-Id: Icc5014276c75bf3126ccb43a4424e96ca1a51f06 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=33307 Co-Authored-By: Andrew Burgess <aburgess@redhat.com> Approved-By: Andrew Burgess <aburgess@redhat.com>

New in v2: - make the test try with indexes by default - using uint8_t instead of unsigned char In some specific circumstances, it is possible for GDB to read a type unit from a .dwo file without ever reading in the section of the stub in the main file. In that case, calling any of these methods: - dwarf2_per_cu::addr_size() - dwarf2_per_cu::offset_size() - dwarf2_per_cu::ref_addr_size() will cause a crash, because they will try to read the unit header from the not-read-in section buffer. See the previous patch for more details. The remaining calls to these methods are in the loc.c and expr.c files. That is, in the location and expression machinery. It is possible to set things up to cause them to trigger a crash, as shown by the new test, when running it with the cc-with-gdb-index board: $ make check TESTS="gdb.dwarf2/fission-type-unit-locexpr.exp" RUNTESTFLAGS="--target_board=cc-with-gdb-index" Running /home/simark/src/binutils-gdb/gdb/testsuite/gdb.dwarf2/fission-type-unit-locexpr.exp ... ERROR: GDB process no longer exists The backtrace at the moment of the crash is: #0 0x0000555566968b1f in bfd_getl32 (p=0x78) at /home/simark/src/binutils-gdb/bfd/libbfd.c:846 #1 0x00005555642e51b7 in read_initial_length (abfd=0x7d1ff1eb0e40, buf=0x78 <error: Cannot access memory at address 0x78>, bytes_read=0x7bfff09daca0, handle_nonstd=true) at /home/simark/src/binutils-gdb/gdb/dwarf2/leb.c:92 #2 0x00005555647ca584 in read_unit_head (header=0x7d0ff1e06c70, info_ptr=0x78 <error: Cannot access memory at address 0x78>, section=0x7c3ff1dea7d0, section_kind=ruh_kind::COMPILE) at /home/simark/src/binutils-gdb/gdb/dwarf2/unit-head.c:44 #3 0x000055556452df18 in dwarf2_per_cu::get_header (this=0x7d0ff1e06c40) at /home/simark/src/binutils-gdb/gdb/dwarf2/read.c:18531 #4 0x000055556452e10e in dwarf2_per_cu::addr_size (this=0x7d0ff1e06c40) at /home/simark/src/binutils-gdb/gdb/dwarf2/read.c:18544 #5 0x0000555564314ac3 in dwarf2_locexpr_baton_eval (dlbaton=0x7bfff0c9a508, frame=..., addr_stack=0x7bfff0b59150, valp=0x7bfff0c9a430, push_values=..., is_reference=0x7bfff0d33030) at /home/simark/src/binutils-gdb/gdb/dwarf2/loc.c:1593 #6 0x0000555564315bd2 in dwarf2_evaluate_property (prop=0x7bfff0c9a450, initial_frame=..., addr_stack=0x7bfff0b59150, value=0x7bfff0c9a430, push_values=...) at /home/simark/src/binutils-gdb/gdb/dwarf2/loc.c:1668 #7 0x0000555564a14ee1 in resolve_dynamic_field (field=..., addr_stack=0x7bfff0b59150, frame=...) at /home/simark/src/binutils-gdb/gdb/gdbtypes.c:2758 #8 0x0000555564a15e24 in resolve_dynamic_struct (type=0x7e0ff1f02550, addr_stack=0x7bfff0b59150, frame=...) at /home/simark/src/binutils-gdb/gdb/gdbtypes.c:2839 #9 0x0000555564a17061 in resolve_dynamic_type_internal (type=0x7e0ff1f02550, addr_stack=0x7bfff0b59150, frame=..., top_level=true) at /home/simark/src/binutils-gdb/gdb/gdbtypes.c:2972 #10 0x0000555564a17899 in resolve_dynamic_type (type=0x7e0ff1f02550, valaddr=..., addr=0x4010, in_frame=0x7bfff0d32e60) at /home/simark/src/binutils-gdb/gdb/gdbtypes.c:3019 #11 0x000055556675fb34 in value_from_contents_and_address (type=0x7e0ff1f02550, valaddr=0x0, address=0x4010, frame=...) at /home/simark/src/binutils-gdb/gdb/value.c:3674 riscvarchive#12 0x00005555666ce911 in get_value_at (type=0x7e0ff1f02550, addr=0x4010, frame=..., lazy=1) at /home/simark/src/binutils-gdb/gdb/valops.c:992 riscvarchive#13 0x00005555666ceb89 in value_at_lazy (type=0x7e0ff1f02550, addr=0x4010, frame=...) at /home/simark/src/binutils-gdb/gdb/valops.c:1039 riscvarchive#14 0x000055556491909f in language_defn::read_var_value (this=0x5555725fce40 <minimal_language_defn>, var=0x7e0ff1f02500, var_block=0x7e0ff1f025d0, frame_param=...) at /home/simark/src/binutils-gdb/gdb/findvar.c:504 riscvarchive#15 0x000055556491961b in read_var_value (var=0x7e0ff1f02500, var_block=0x7e0ff1f025d0, frame=...) at /home/simark/src/binutils-gdb/gdb/findvar.c:518 riscvarchive#16 0x00005555666d1861 in value_of_variable (var=0x7e0ff1f02500, b=0x7e0ff1f025d0) at /home/simark/src/binutils-gdb/gdb/valops.c:1384 riscvarchive#17 0x00005555647f7099 in evaluate_var_value (noside=EVAL_NORMAL, blk=0x7e0ff1f025d0, var=0x7e0ff1f02500) at /home/simark/src/binutils-gdb/gdb/eval.c:533 riscvarchive#18 0x00005555647f740c in expr::var_value_operation::evaluate (this=0x7c2ff1e3b690, expect_type=0x0, exp=0x7c2ff1e3aa00, noside=EVAL_NORMAL) at /home/simark/src/binutils-gdb/gdb/eval.c:559 riscvarchive#19 0x00005555647f3347 in expression::evaluate (this=0x7c2ff1e3aa00, expect_type=0x0, noside=EVAL_NORMAL) at /home/simark/src/binutils-gdb/gdb/eval.c:109 riscvarchive#20 0x000055556543ac2f in process_print_command_args (args=0x7fffffffe728 "global_var", print_opts=0x7bfff0be4a30, voidprint=true) at /home/simark/src/binutils-gdb/gdb/printcmd.c:1328 riscvarchive#21 0x000055556543ae65 in print_command_1 (args=0x7fffffffe728 "global_var", voidprint=1) at /home/simark/src/binutils-gdb/gdb/printcmd.c:1341 riscvarchive#22 0x000055556543b707 in print_command (exp=0x7fffffffe728 "global_var", from_tty=1) at /home/simark/src/binutils-gdb/gdb/printcmd.c:1408 The problem to solve is: in order to evaluate a location expression, we need to know some information (the various sizes) found in the unit header. In that context, it's not possible to get it from dwarf2_cu::header, like the previous patch did: at the time the expression is evaluated, the corresponding dwarf2_cu might have been freed. We don't want to re-build a dwarf2_cu just for that, it would be very inefficient. We could force reading in the dwarf2_per_cu::section section (in the main file), but we never needed to read that section before, so it would be better to avoid reading it unnecessarily. My initial attempt was to store this information in baton objects (dwarf2_locexpr_baton & co), so that it can be retrieved when the time comes to evaluate the expressions. However, it quickly became obvious that storing it there would be redundant and wasteful. I instead opted to store this information directly inside dwarf2_per_cu, making it easily available when evaluating expressions. These fields initially have the value 0, and are set by cutu_reader whenever the unit is parsed. The various getters (dwarf2_per_cu::addr_size & al) now just return these fields. Doing so allows removing anything related to reading the header from dwarf2_per_cu, which I think is a nice simplification. This means that nothing ever needs to read the header from just a dwarf2_per_cu. It also happens to shrink the dwarf2_per_cu object size a bit, going from: (top-gdb) p sizeof(dwarf2_per_cu) $1 = 176 to (top-gdb) p sizeof(dwarf2_per_cu) $1 = 120 I placed the new fields at this strange location in dwarf2_per_cu because there happened to be a nice 3 bytes hole there (on Linux amd64 at least). The new test set things up as described previously. Note that the crash only occurs if using the cc-with-gdb-index board. Change-Id: I50807a1bbb605f0f92606a9e61c026e3376a4fcf Approved-By: Andrew Burgess <aburgess@redhat.com>

If breakpoint commands contain detach or kill, then gdb tries to access freed memory: (gdb) b main Breakpoint 1 at 0x111d: file main.c, line 21. (gdb) commands Type commands for breakpoint(s) 1, one per line. End with a line saying just "end". >detach >end (gdb) run Starting program: /home/src/lappy/binutils-gdb.git/gdb/testsuite/gdb.base/main [Thread debugging using libthread_db enabled] Using host libthread_db library "/usr/lib/../lib/libthread_db.so.1". main () at main.c:21 21 return 0; [Inferior 1 (process 241852) detached] ================================================================= ==241817==ERROR: AddressSanitizer: heap-use-after-free on address 0x7b7a3de0b760 at pc 0x55fcb92613fe bp 0x7ffec2d524f0 sp 0x7ffec2d524e0 READ of size 8 at 0x7b7a3de0b760 thread T0 #0 0x55fcb92613fd in bpstat_do_actions_1 ../../gdb/breakpoint.c:4898 #1 0x55fcb92617da in bpstat_do_actions() ../../gdb/breakpoint.c:5012 #2 0x55fcba3180e7 in inferior_event_handler(inferior_event_type) ../../gdb/inf-loop.c:71 #3 0x55fcba3ba1e1 in fetch_inferior_event() ../../gdb/infrun.c:4769 0x7b7a3de0b760 is located 0 bytes inside of 56-byte region [0x7b7a3de0b760,0x7b7a3de0b798) freed by thread T0 here: #0 0x7f1a43522a2d in operator delete(void*, unsigned long) /usr/src/debug/gcc/gcc/libsanitizer/asan/asan_new_delete.cpp:155 #1 0x55fcb925d5cd in bpstat_clear(bpstat**) ../../gdb/breakpoint.c:4646 #2 0x55fcbb69ea6a in clear_thread_inferior_resources ../../gdb/thread.c:185 #3 0x55fcbb69f4cb in set_thread_exited(thread_info*, std::optional<unsigned long>, bool) ../../gdb/thread.c:244 #4 0x55fcba368d64 in operator() ../../gdb/inferior.c:269 #5 0x55fcba375e2b in clear_and_dispose<inferior::clear_thread_list()::<lambda(thread_info*)> > ../../gdb/../gdbsupport/intrusive_list.h:529 #6 0x55fcba368f19 in inferior::clear_thread_list() ../../gdb/inferior.c:265 #7 0x55fcba3694ba in exit_inferior(inferior*) ../../gdb/inferior.c:322 #8 0x55fcba369e35 in detach_inferior(inferior*) ../../gdb/inferior.c:358 #9 0x55fcba319d9f in inf_ptrace_target::detach_success(inferior*) ../../gdb/inf-ptrace.c:214 #10 0x55fcba56a2f6 in linux_nat_target::detach(inferior*, int) ../../gdb/linux-nat.c:1582 #11 0x55fcba62121c in thread_db_target::detach(inferior*, int) ../../gdb/linux-thread-db.c:1381 riscvarchive#12 0x55fcbb5ca49e in target_detach(inferior*, int) ../../gdb/target.c:2557 riscvarchive#13 0x55fcba356ba4 in detach_command(char const*, int) ../../gdb/infcmd.c:2894 riscvarchive#14 0x55fcb9597eea in do_simple_func ../../gdb/cli/cli-decode.c:94 riscvarchive#15 0x55fcb95b10b5 in cmd_func(cmd_list_element*, char const*, int) ../../gdb/cli/cli-decode.c:2831 riscvarchive#16 0x55fcbb6f5282 in execute_command(char const*, int) ../../gdb/top.c:563 riscvarchive#17 0x55fcb95eedb9 in execute_control_command_1 ../../gdb/cli/cli-script.c:526 riscvarchive#18 0x55fcb95f04dd in execute_control_command(command_line*, int) ../../gdb/cli/cli-script.c:702 riscvarchive#19 0x55fcb9261175 in bpstat_do_actions_1 ../../gdb/breakpoint.c:4940 riscvarchive#20 0x55fcb92617da in bpstat_do_actions() ../../gdb/breakpoint.c:5012 riscvarchive#21 0x55fcba3180e7 in inferior_event_handler(inferior_event_type) ../../gdb/inf-loop.c:71 riscvarchive#22 0x55fcba3ba1e1 in fetch_inferior_event() ../../gdb/infrun.c:4769 previously allocated by thread T0 here: #0 0x7f1a435218cd in operator new(unsigned long) /usr/src/debug/gcc/gcc/libsanitizer/asan/asan_new_delete.cpp:86 #1 0x55fcb927061f in build_bpstat_chain(address_space const*, unsigned long, target_waitstatus const&) ../../gdb/breakpoint.c:5880 #2 0x55fcba3d63b6 in handle_signal_stop ../../gdb/infrun.c:7083 #3 0x55fcba3d01c7 in handle_inferior_event ../../gdb/infrun.c:6574 #4 0x55fcba3b9918 in fetch_inferior_event() ../../gdb/infrun.c:4713 This checks after executing commands of each breakpoint if the bpstat was deleted already, and stops any further processing immediately. Now the result looks like this: (gdb) b main Breakpoint 1 at 0x111d: file main.c, line 21. (gdb) commands Type commands for breakpoint(s) 1, one per line. End with a line saying just "end". >detach >end (gdb) run Starting program: /home/src/lappy/binutils-gdb.git/gdb/testsuite/gdb.base/main [Thread debugging using libthread_db enabled] Using host libthread_db library "/usr/lib/../lib/libthread_db.so.1". main () at main.c:21 21 return 0; [Inferior 1 (process 242940) detached] (gdb) Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=14354 Approved-By: Andrew Burgess <aburgess@redhat.com>

This patch adds a new test that checks for a bug that was, if not fixed, then at least, worked around, by commit: commit a736ff7 Date: Sat Sep 27 22:29:24 2025 -0600 Clean up iterate_over_symtabs The bug was reported against Fedora GDB which, at the time the bug was reported, is based off GDB 16, and so doesn't include the above commit. The bug report can be found here: https://bugzilla.redhat.com/show_bug.cgi?id=2403580 To summarise the bug report: a user is inspecting an application backtrace. The original bug report was from a core file, but the same issue will trigger for a live inferior. It's the inspection of the stack frames which is important. The user moves up the stack with the 'up' command and eventually finds an interesting frame. They use 'list' to view the source code at the current location, this works and displays lines 6461 to 6470 from the source file '../glib/gmain.c'. The user then does 'list 6450' to try and display some earlier lines from the same source file, at which point GDB gives the message: warning: 6445 ../glib/gmain.c: No such file or directory So GDB initially manages to find the source file, but for the very next command, GDB now claims that the source file doesn't exist. As I said, commit a736ff7 appears to fix this issue, but it wasn't clear to me (from the commit message) if this commit was intended to fix any bugs, or if the bug was being hidden by this commit. I've spent some time trying to understand what's going on, and have come up with this test case. I think there might still be an issue in GDB, but I do think that the above commit really is making it so that the issue (if it is an issue) doesn't occur in that particular situation any more, so I think we can consider the above commit a fix, and testing for this bug is worth while to ensure it doesn't get reintroduced. In order to trigger this bug we need these high level requirements: 1. Multiple shared libraries compiled from the same source tree. In this case it was glib, but the test in this commit uses a much smaller library. 2. Common DWARF must be pulled from the libraries using the 'dwz' tool. 3. Debuginfod must be in use for at least downloading the source code. In the original bug, and in the test presented here, debuginfod is used for fetching both the debug info, and the source code for the library. There are some additional specific requirements for the DWARF in order to trigger the bug, but to make discussing this easier, lets look at the structure of the test presented here. When discussing the source files I'll drop the solib-with-dwz- prefix, e.g. when I mention 'foo.c' I really mean 'solib-with-dwz-foo.c'. There are three shared libraries built for this test, libbar.so, libfoo.so, and libfoo-2.so. The source file bar.c is used to create libbar.so, and foo.c is used to create libfoo.so and libfoo-2.so. The main test executable is built from main.c, and links against libbar.so and libfoo.so. libfoo-2.so is not used by the main executable, and just exists to trigger some desired behaviour from the dwz tool. The debug information for each shared library is extracted into a corresponding .debug file, and the dwz tool is used to extract common debug from the three .debug files into a file called 'common.dwz'. Given all this then, in order to trigger the bug, the following additional requirements must be met: 4. libbar.so must NOT make use of foo.c. In this test libbar.so is built from bar.c (and some headers) only. 5. A reference to foo.c must be placed into common.dwz. This is why libfoo-2.so exists, as this library is almost identical to libfoo.so, there is lots of shared DWARF between libfoo.so and libfoo-2.so which can be moved into common.dwz, this shared DWARF includes references to foo.c, so an entry for foo.c is added to the file table list in common.dwz. 6. There must be a DWARF construct within libbar.so.debug that references common.dwz, and which causes GDB to parse the line table from within common.dwz. For more details on this, see below. 7. We need libbar.so to appear before libfoo.so in GDB's comunit_symtab lists. This means that GDB will scan the symtabs for libbar.so before checking the symtabs of libfoo.so. I achieve this by mentioning libbar.so first when building the executable, but this is definitely the most fragile part of the test. To satisfy requirement (6) the inline function 'add_some_int' is added to the test. This function appears in both libbar.so and libfoo.so, this means that the DW_TAG_subprogram representing the abstract instance tree will be moved into common.dwz. However, as this is an inline function, the DW_TAG_inlined_subroutine DIEs for each concrete instance, will be left in libbar.so.debug and libfoo.so.debug, with a DW_AT_abstract_origin that points into common.dwz. When GDB parses libbar.so.debug it finds the DW_TAG_inlined_subroutine and begins processing it. It sees the DW_AT_abstract_origin and so jumps into common.dwz to read the DIEs that define the inline function. Here is the DWARF from libbar.so.debug for the inlined instance: <2><91>: Abbrev Number: 3 (DW_TAG_inlined_subroutine) <92> DW_AT_abstract_origin: <alt 0x1b> <96> DW_AT_low_pc : 0x1121 <9e> DW_AT_high_pc : 31 <9f> DW_AT_call_file : 1 <a0> DW_AT_call_line : 26 <a1> DW_AT_call_column : 15 <3><a2>: Abbrev Number: 5 (DW_TAG_formal_parameter) <a3> DW_AT_abstract_origin: <alt 0x2c> <a7> DW_AT_location : 2 byte block: 91 68 (DW_OP_fbreg: -24) <3><aa>: Abbrev Number: 5 (DW_TAG_formal_parameter) <ab> DW_AT_abstract_origin: <alt 0x25> <af> DW_AT_location : 2 byte block: 91 6c (DW_OP_fbreg: -20) And here's the DWARF from common.dwz for the abstract instance tree: <1><1b>: Abbrev Number: 7 (DW_TAG_subprogram) <1c> DW_AT_name : (indirect string, offset: 0x18a): add_some_int <20> DW_AT_decl_file : 1 <21> DW_AT_decl_line : 24 <22> DW_AT_decl_column : 1 <23> DW_AT_prototyped : 1 <23> DW_AT_type : <0x14> <24> DW_AT_inline : 3 (declared as inline and inlined) <2><25>: Abbrev Number: 8 (DW_TAG_formal_parameter) <26> DW_AT_name : a <28> DW_AT_decl_file : 1 <29> DW_AT_decl_line : 24 <2a> DW_AT_decl_column : 19 <2b> DW_AT_type : <0x14> <2><2c>: Abbrev Number: 8 (DW_TAG_formal_parameter) <2d> DW_AT_name : b <2f> DW_AT_decl_file : 1 <30> DW_AT_decl_line : 24 <31> DW_AT_decl_column : 26 <32> DW_AT_type : <0x14> While processing the common.dwz DIEs GDB sees the DW_AT_decl_file attributes, and this triggers a read of the file table within common.dwz, which creates symtabs for any files mentioned, if the symtabs don't already exist. But, and this is the important bit, when doing this, GDB is creating a compunit_symtab for libbar.so.debug, so any symtabs created will be attached to the libbar.so.debug objfile. Remember requirement (5), the file list in common.dwz mentions 'foo.c', so even though libbar.so doesn't use 'foo.c' we end up with a symtab for 'foo.c' created within the compunit_symtab for libbar.so.debug! I don't think this is ideal. This wastes memory and time; we have more symtabs to search through even if, as I'll discuss below, we usually end up ignoring these symtabs. The exact path that triggers this weird symtab creation starts with a call to 'new_symbol' (dwarf2/read.c) for the DW_TAG_formal_parameter in the abstract instance tree. These include DW_AT_decl_file, which is read in 'new_symbol'. In 'new_symbol' GDB spots that the line_header has not yet been read in, so handle_DW_AT_stmt_list is called which reads the file/line table and then calls 'dwarf_decode_lines' (line_program.c), which then creates symtabs for all the files mentioned. This symtab creation issue still exists today in GDB, though I've not been able to find any real issues that this is causing after commit a736ff7 fixed the issue I'm discussing here. So, having tricked GDB into creating a misplaced symtab, what problem did this cause prior to commit a736ff7? To answer this, we need to take a diversion to understand how a command like 'list 6450' works. The two interesting functions are create_sals_line_offset and decode_digits_list_mode, which is called from the former. The create_sals_line_offset is called indirectly from list_command via the initial call to decode_line_1. In create_sals_line_offset, if the incoming linespec doesn't specify a specific symtab, then GDB uses the name of the default symtab to lookup every symtab with a matching name, this is done with the line: ls->file_symtabs = collect_symtabs_from_filename (self->default_symtab->filename (), self->search_pspace); In our case, when the default symtab is 'foo.c', this is going to return multiple symtabs, these will include the correct 'foo.c' symtab from libfoo.so, but will also include the misplaced 'foo.c' symtab from libbar.so. This is where the ordering is important. As list will only ever list one file, at a later point in this process we're going to toss out everything except the first result. So, to trigger the bug, it is critical that the FIRST result returned here be the misplaced 'foo.c' symtab from libbar.so. In the test I try to ensure this by mentioning libbar.so before libfoo.so when building the executable, which currently means we get back the misplaced symtab first, but this could change in the future and wouldn't necessarily mean that the problem has gone away. Having got the symtab list GDB then calls decode_digits_list_mode which iterates over the symtabs and converts them into symtab_and_line objects, at the heart of which is a call to find_line_symtab, which checks if a given symtab has a line table entry for the desired line. If it does then the symtab is returned. If it doesn't then GDB looks for another symtab with the same name that does have a line table entry. If no suitably named symtab has an exact match, then the symtab with the closest line above the required line is returned. If no symtab has a matching line table entry then find_line_symtab returns NULL. Remember, the misplaced symtab was only created as a side effect of trying to attach the DW_TAG_formal_parameter symbol to a symtab. The actual line table for libbar.so (in libbar.so.debug) has no line table entries for 'foo.c'. What this means is that the line table for 'foo.c' attached to libbar.so.debug is empty. So normally what happens is that find_line_symtab will instead find a line table entry for 'foo.c' in libfoo.so.debug that does have a suitable line table entry, and will switch GDB back to that symtab, effectively avoiding the problem. However, that is not what happens in the bug case. In the bug case find_line_symtab returns NULL, which means that decode_digits_list_mode just uses the original symtab, in this case the symtab for 'foo.c' from libbar.so.debug. In the original bug, the code is compiled with -O2, and this optimisation has left the line table covering the problem file pretty sparse. In fact, there are no line table entries for any line after the line that the user is trying to list. This is why find_line_symtab doesn't find a better alternative symtab, and instead just returns NULL. In the test I've replicated this by having a comment at the end of the source file, and asking GDB to list a line within this comment. The result is that there are no line table entries for that line in any 'foo.c' symtab, and so find_line_symtab returns NULL. After decode_digits_list_mode sees the NULL from find_line_symtab, it just uses the initial symtab. After this we eventually return back to list_command (cli/cli-cmds.c) with a list of symtab_and_line objects. The first entry in this list is for the symtab 'foo.c' from libbar.so. In list_command we call filter_sals which throws away everything but the first entry as all the symtabs have the same filename (and are in the same program space). Using the symtab we build an absolute path to the source file. Now, if the source is installed locally, GDB performs no additional checks; we found a symtab, the symtab gave us a source filename, if the source file exists on disk, then the requires lines are listed for the user. But if the source file doesn't exist on disk, then we are going to ask debuginfod for the source file. To do that we use two pieces of information; the absolute path to the source file, which we have; and the build-id of an objfile, this is the objfile that owns the symtab we are trying to get the source for. In this case libbar.so. And so we send the build-id and filename to debuginfod. Now debuginfod isn't going to just serve any file to anyone, that would be a security issue for the server. Instead, debuginfod scans the DWARF and builds up its own model of which objfiles use which source files, and for a given build-id, debuginfod will only serve back files that the objfile matching that build-id, actually uses. So, in this case, when we ask for 'foo.c' from libbar.so, debuginfod correctly realises the 'foo.c' is not part of libbar.so, and refuses to send the file back. And this is how the original bug occurred. So, why does commit a736ff7 fix this problem? The answer is in iterate_over_symtabs, which is used by collect_symtabs_from_filename to find the matching symtabs. Prior to this commit, iterate_over_symtabs had two phases, first a call to iterate_over_some_symtabs which walks over compunit_symtabs that already exist looking for matches, during this phase only the symtab filenames are considered. The second phase uses objfile::map_symtabs_matching_filename to look through the objfiles and expand new symtabs that match the required name. In our case, by the time iterate_over_symtabs is called, all of the interesting symtabs have already been expanded, so we only perform the filename check in iterate_over_some_symtabs, this passes, and so 'foo.c' from libbar.so is considered a suitable symtab. After commit a736ff7 the initial call to iterate_over_some_symtabs has been removed from iterate_over_symtabs, and only the objfile::map_symtabs_matching_filename call remains. This ends up in cooked_index_functions::search (dwarf2/read.c) to search for matching symtabs. The first think cooked_index_functions::search does is setup a vector of CUs to skip by calling dw_search_file_matcher, this then calls dw2_get_file_names to get the file and line table for a CU, this function in turn creates a cutu_reader object, passing true for the 'skip_partial' argument to its constructor. As our 'foo.c' symtab was created from within the dwz extracted DWARF, then it is associated with the DW_TAG_partial_unit that held the DW_TAG_subprogram DIEs that were being processed when the misplaced symtab was original created; this is a partial unit. As this is a partial unit, and the skip_partial flag was passed true, the cutu_reader::is_dummy function will return true. Back in dw2_get_file_names, if cutu_reader::is_dummy is true then dw2_get_file_names_reader is never called, and the file names are never read. This means that back in dw_search_file_matcher, the file data, returned from dw2_get_file_names is NULL, and so this CU is marked to be skipped. Which is exactly what we want, this misplaced symtab, which was created for a partial unit and associated with libbar.so, is skipped and never considered as a possible match. There is a remaining problem, which is marked in the test with an xfail. That is, when the test does the 'list LINENO', GDB still tries to download the source for 'foo.c' from libbar.so. The reason for this is that, while it is true that the initial collect_symtabs_from_filename call no longer returns 'foo.c' from libbar.so, when decode_digits_list_mode calls find_line_symtab for the correct 'foo.c' from libfoo.so, it is still the case that there is no exact match for LINENO in that symtabs line table. As a result, GDB looks through all the other symtabs for 'foo.c' to see if any are a better match. Checking if another symtab is a possible better match requires a full comparison of the symtabs source file name, which in this case triggers an attempt to download the source file from debuginfod. Here's the backtrace at the time of the rogue source download request, which appears as an xfail in the test presented here: #0 debuginfod_source_query (build_id=..., build_id_len=..., srcpath=..., destname=...) at ../../src/gdb/debuginfod-support.c:332 #1 0x0000000000f0bb3b in open_source_file (s=...) at ../../src/gdb/source.c:1152 #2 0x0000000000f0be42 in symtab_to_fullname (s=...) at ../../src/gdb/source.c:1214 #3 0x0000000000f6dc40 in find_line_symtab (sym_tab=..., line=..., index=...) at ../../src/gdb/symtab.c:3314 #4 0x0000000000aea319 in decode_digits_list_mode (self=..., ls=..., line=...) at ../../src/gdb/linespec.c:3939 #5 0x0000000000ae4684 in create_sals_line_offset (self=..., ls=...) at ../../src/gdb/linespec.c:2039 #6 0x0000000000ae557f in convert_linespec_to_sals (state=..., ls=...) at ../../src/gdb/linespec.c:2289 #7 0x0000000000ae6546 in parse_linespec (parser=..., arg=..., match_type=...) at ../../src/gdb/linespec.c:2647 #8 0x0000000000ae7605 in location_spec_to_sals (parser=..., locspec=...) at ../../src/gdb/linespec.c:3045 #9 0x0000000000ae7c7f in decode_line_1 (locspec=..., flags=..., search_pspace=..., default_symtab=..., default_line=...) at ../../src.dev-m/gdb/linespec.c:3167 I think that this might not be what we really want to do here. After downloading the source file we'll end up with a filename within the debuginfod download cache, which will be different for each objfile (the cache partitions downloads based on build-id). So if two symtabs originate from the same source file, but are in two different objfiles, then, when the source is on disk, the filenames for these symtabs will be identical, and the symtabs will be considered equivalent by find_line_symtab. But when debuginfod is downloading the source the source paths will be different, and find_line_symtab will consider the symtabs different. This doesn't seem right to me. But I'm going to leave worrying about that for another day. Given this last bug, I am of the opinion that the misplaced symtab is likely a bug, though after commit a736ff7, the only issue I can find is the extra debuginfod download request, which isn't huge. But still, maybe just reducing the number of symtabs would be worth it? But this patch isn't about fixing any bugs, it's about adding a test case for an issue that was a problem, but isn't any longer. Approved-By: Tom Tromey <tom@tromey.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GDB overlay support for RISC-V#9

GDB overlay support for RISC-V#9
dmi391 wants to merge 1 commit intosifive:masterfrom
dmi391:gdb-riscv-overlay

dmi391 commented Sep 14, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dmi391 commented Sep 14, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant