Skip to content

Commit 55fc162

Browse files
committed
Add name canonicalization for C
PR symtab/29105 shows a number of situations where symbol lookup can result in the expansion of too many CUs. What happens is that lookup_signed_typename will try to look up a type like "signed int". In cooked_index_functions::expand_symtabs_matching, when looping over languages, the C++ case will canonicalize this type name to be "int" instead. Then this method will proceed to expand every CU that has an entry for "int" -- i.e., nearly all of them. A crucial component of this is that the caller, objfile::lookup_symbol, does not do this canonicalization, so when it tries to find the symbol for "signed int", it fails -- causing the loop to continue. This patch fixes the problem by introducing name canonicalization for C. The idea here is that, by making C and C++ agree on the canonical name when a symbol name can have multiple spellings, we avoid the bad behavior in objfile::lookup_symbol (and any other such code -- I don't know if there is any). Unlike C++, C only has a few situations where canonicalization is needed. And, in particular, due to the lack of overloading (thus avoiding any issues in linespec) and due to the way c-exp.y works, I think that no canonicalization is needed during symbol lookup -- only during symtab construction. This explains why lookup_name_info is not touched. The stabs reader is modified on a "best effort" basis. The DWARF reader needed one small tweak in dwarf2_name to avoid a regression in dw2-unusual-field-names.exp. I think this is adequately explained by the comment, but basically this is a scenario that should not occur in real code, only the gdb test suite. lookup_signed_typename is simplified. It used to search for two different type names, but now gdb can search just for the canonical form. gdb.dwarf2/enum-type.exp needed a small tweak, because the canonicalizer turns "unsigned integer" into "unsigned int integer". It seems better here to use the correct C type name. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29105 Tested-by: Simon Marchi <[email protected]> Reviewed-by: Andrew Burgess <[email protected]>
1 parent bed34ce commit 55fc162

File tree

8 files changed

+80
-26
lines changed

8 files changed

+80
-26
lines changed

gdb/c-lang.c

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -727,6 +727,20 @@ c_is_string_type_p (struct type *type)
727727

728728

729729

730+
/* See c-lang.h. */
731+
732+
gdb::unique_xmalloc_ptr<char>
733+
c_canonicalize_name (const char *name)
734+
{
735+
if (strchr (name, ' ') != nullptr
736+
|| streq (name, "signed")
737+
|| streq (name, "unsigned"))
738+
return cp_canonicalize_string (name);
739+
return nullptr;
740+
}
741+
742+
743+
730744
void
731745
c_language_arch_info (struct gdbarch *gdbarch,
732746
struct language_arch_info *lai)

gdb/c-lang.h

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -167,4 +167,9 @@ extern std::string cplus_compute_program (compile_instance *inst,
167167
const struct block *expr_block,
168168
CORE_ADDR expr_pc);
169169

170+
/* Return the canonical form of the C symbol NAME. If NAME is already
171+
canonical, return nullptr. */
172+
173+
extern gdb::unique_xmalloc_ptr<char> c_canonicalize_name (const char *name);
174+
170175
#endif /* !defined (C_LANG_H) */

gdb/dbxread.c

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@
4848
#include "complaints.h"
4949
#include "cp-abi.h"
5050
#include "cp-support.h"
51+
#include "c-lang.h"
5152
#include "psympriv.h"
5253
#include "block.h"
5354
#include "aout/aout64.h"
@@ -1444,6 +1445,18 @@ read_dbx_symtab (minimal_symbol_reader &reader,
14441445
new_name.get ());
14451446
}
14461447
}
1448+
else if (psymtab_language == language_c)
1449+
{
1450+
std::string name (namestring, p - namestring);
1451+
gdb::unique_xmalloc_ptr<char> new_name
1452+
= c_canonicalize_name (name.c_str ());
1453+
if (new_name != nullptr)
1454+
{
1455+
sym_len = strlen (new_name.get ());
1456+
sym_name = obstack_strdup (&objfile->objfile_obstack,
1457+
new_name.get ());
1458+
}
1459+
}
14471460

14481461
if (sym_len == 0)
14491462
{

gdb/dwarf2/cooked-index.c

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121
#include "dwarf2/cooked-index.h"
2222
#include "dwarf2/read.h"
2323
#include "cp-support.h"
24+
#include "c-lang.h"
2425
#include "ada-lang.h"
2526
#include "split-name.h"
2627
#include <algorithm>
@@ -210,14 +211,17 @@ cooked_index::do_finalize ()
210211
m_names.push_back (std::move (canon_name));
211212
}
212213
}
213-
else if (entry->per_cu->lang () == language_cplus)
214+
else if (entry->per_cu->lang () == language_cplus
215+
|| entry->per_cu->lang () == language_c)
214216
{
215217
void **slot = htab_find_slot (seen_names.get (), entry,
216218
INSERT);
217219
if (*slot == nullptr)
218220
{
219221
gdb::unique_xmalloc_ptr<char> canon_name
220-
= cp_canonicalize_string (entry->name);
222+
= (entry->per_cu->lang () == language_cplus
223+
? cp_canonicalize_string (entry->name)
224+
: c_canonicalize_name (entry->name));
221225
if (canon_name == nullptr)
222226
entry->canonical = entry->name;
223227
else

gdb/dwarf2/read.c

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22014,14 +22014,25 @@ static const char *
2201422014
dwarf2_canonicalize_name (const char *name, struct dwarf2_cu *cu,
2201522015
struct objfile *objfile)
2201622016
{
22017-
if (name && cu->lang () == language_cplus)
22017+
if (name == nullptr)
22018+
return name;
22019+
22020+
if (cu->lang () == language_cplus)
2201822021
{
2201922022
gdb::unique_xmalloc_ptr<char> canon_name
2202022023
= cp_canonicalize_string (name);
2202122024

2202222025
if (canon_name != nullptr)
2202322026
name = objfile->intern (canon_name.get ());
2202422027
}
22028+
else if (cu->lang () == language_c)
22029+
{
22030+
gdb::unique_xmalloc_ptr<char> canon_name
22031+
= c_canonicalize_name (name);
22032+
22033+
if (canon_name != nullptr)
22034+
name = objfile->intern (canon_name.get ());
22035+
}
2202522036

2202622037
return name;
2202722038
}
@@ -22050,6 +22061,11 @@ dwarf2_name (struct die_info *die, struct dwarf2_cu *cu)
2205022061

2205122062
switch (die->tag)
2205222063
{
22064+
/* A member's name should not be canonicalized. This is a bit
22065+
of a hack, in that normally it should not be possible to run
22066+
into this situation; however, the dw2-unusual-field-names.exp
22067+
test creates custom DWARF that does. */
22068+
case DW_TAG_member:
2205322069
case DW_TAG_compile_unit:
2205422070
case DW_TAG_partial_unit:
2205522071
/* Compilation units have a DW_AT_name that is a filename, not

gdb/gdbtypes.c

Lines changed: 3 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1729,15 +1729,9 @@ lookup_unsigned_typename (const struct language_defn *language,
17291729
struct type *
17301730
lookup_signed_typename (const struct language_defn *language, const char *name)
17311731
{
1732-
struct type *t;
1733-
char *uns = (char *) alloca (strlen (name) + 8);
1734-
1735-
strcpy (uns, "signed ");
1736-
strcpy (uns + 7, name);
1737-
t = lookup_typename (language, uns, NULL, 1);
1738-
/* If we don't find "signed FOO" just try again with plain "FOO". */
1739-
if (t != NULL)
1740-
return t;
1732+
/* In C and C++, "char" and "signed char" are distinct types. */
1733+
if (streq (name, "char"))
1734+
name = "signed char";
17411735
return lookup_typename (language, name, NULL, 0);
17421736
}
17431737

gdb/stabsread.c

Lines changed: 19 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -736,11 +736,13 @@ define_symbol (CORE_ADDR valu, const char *string, int desc, int type,
736736

737737
if (sym->language () == language_cplus)
738738
{
739-
char *name = (char *) alloca (p - string + 1);
740-
741-
memcpy (name, string, p - string);
742-
name[p - string] = '\0';
743-
new_name = cp_canonicalize_string (name);
739+
std::string name (string, p - string);
740+
new_name = cp_canonicalize_string (name.c_str ());
741+
}
742+
else if (sym->language () == language_c)
743+
{
744+
std::string name (string, p - string);
745+
new_name = c_canonicalize_name (name.c_str ());
744746
}
745747
if (new_name != nullptr)
746748
sym->compute_and_set_names (new_name.get (), true, objfile->per_bfd);
@@ -1592,12 +1594,18 @@ read_type (const char **pp, struct objfile *objfile)
15921594
type_name = NULL;
15931595
if (get_current_subfile ()->language == language_cplus)
15941596
{
1595-
char *name = (char *) alloca (p - *pp + 1);
1596-
1597-
memcpy (name, *pp, p - *pp);
1598-
name[p - *pp] = '\0';
1599-
1600-
gdb::unique_xmalloc_ptr<char> new_name = cp_canonicalize_string (name);
1597+
std::string name (*pp, p - *pp);
1598+
gdb::unique_xmalloc_ptr<char> new_name
1599+
= cp_canonicalize_string (name.c_str ());
1600+
if (new_name != nullptr)
1601+
type_name = obstack_strdup (&objfile->objfile_obstack,
1602+
new_name.get ());
1603+
}
1604+
else if (get_current_subfile ()->language == language_c)
1605+
{
1606+
std::string name (*pp, p - *pp);
1607+
gdb::unique_xmalloc_ptr<char> new_name
1608+
= c_canonicalize_name (name.c_str ());
16011609
if (new_name != nullptr)
16021610
type_name = obstack_strdup (&objfile->objfile_obstack,
16031611
new_name.get ());

gdb/testsuite/gdb.dwarf2/enum-type.exp

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -37,13 +37,13 @@ Dwarf::assemble $asm_file {
3737
integer_label: DW_TAG_base_type {
3838
{DW_AT_byte_size 4 DW_FORM_sdata}
3939
{DW_AT_encoding @DW_ATE_signed}
40-
{DW_AT_name integer}
40+
{DW_AT_name int}
4141
}
4242

4343
uinteger_label: DW_TAG_base_type {
4444
{DW_AT_byte_size 4 DW_FORM_sdata}
4545
{DW_AT_encoding @DW_ATE_unsigned}
46-
{DW_AT_name {unsigned integer}}
46+
{DW_AT_name {unsigned int}}
4747
}
4848

4949
DW_TAG_enumeration_type {
@@ -79,5 +79,5 @@ gdb_test "print sizeof(enum E)" " = 4"
7979
gdb_test "ptype enum EU" "type = enum EU {TWO = 2}" \
8080
"ptype EU in enum C"
8181
gdb_test_no_output "set lang c++"
82-
gdb_test "ptype enum EU" "type = enum EU : unsigned integer {TWO = 2}" \
82+
gdb_test "ptype enum EU" "type = enum EU : unsigned int {TWO = 2}" \
8383
"ptype EU in C++"

0 commit comments

Comments
 (0)