Skip to content

Commit 2aa7780

Browse files
committed
[StrTable] Switch intrinsics to StringTable and workaround MSVC
Historically, the main example of *very* large string tables used the `EmitCharArray`, but that was switched (without removing the API) in order to consolidate on a nicer emission primitive. While this large string table in `IntrinsicsImpl.inc` seems to compile correctly on MSVC without the workaround this adds back to the nicer emission path, other users have repeatedly hit this MSVC limitation. This PR teaches the string offset table emission to look at the size of the table and fall back to the char array emission strategy when the table becomes too large. The workaround does have the downside of making compile times worse for large string tables, but that appears unavoidable until we can identify known good MSVC versions and switch to requiring them for all LLVM users. It also reduces searchability of the generated string table -- I looked at emitting a comment with each string but it is tricky because the escaping rules for an inline comment are different from those of of a string literal, and there's no real way to turn the string literal into a comment. This PR also switches the `IntrinsicsImpl.inc` string tables over to the new `StringTable` runtime abstraction. I didn't want to do this until landing the MSVC workaround in case it caused even this example to start hitting the MSVC bug, but I wanted to switch here so that I could simplify the API for emitting the string table. With the two different emission strategies, its important to use a very exact syntax and that seems better encapsulated in the API. Follow-up patches will try to consolidate the remaining users onto the single interface, but those at least were easy to separate into follow-ups and keep this PR somewhat smaller.
1 parent 01ad3c5 commit 2aa7780

File tree

6 files changed

+67
-57
lines changed

6 files changed

+67
-57
lines changed

clang/utils/TableGen/ClangDiagnosticsEmitter.cpp

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1785,8 +1785,7 @@ static void emitDiagArrays(DiagsInGroupTy &DiagsInGroup,
17851785
/// This creates an `llvm::StringTable` of all the diagnostic group names.
17861786
static void emitDiagGroupNames(const StringToOffsetTable &GroupNames,
17871787
raw_ostream &OS) {
1788-
GroupNames.EmitStringLiteralDef(
1789-
OS, "static constexpr llvm::StringTable DiagGroupNames");
1788+
GroupNames.EmitStringTableDef(OS, "DiagGroupNames");
17901789
OS << "\n";
17911790
}
17921791

@@ -1939,9 +1938,6 @@ void clang::EmitClangDiagGroups(const RecordKeeper &Records, raw_ostream &OS) {
19391938
inferPedantic.compute(&DiagsInPedantic, &GroupsInPedantic);
19401939

19411940
StringToOffsetTable GroupNames;
1942-
// Add an empty string to the table first so we can use `llvm::StringTable`.
1943-
// TODO: Factor this into `StringToOffsetTable`.
1944-
GroupNames.GetOrAddStringOffset("");
19451941
for (const auto &[Name, Group] : DiagsInGroup) {
19461942
GroupNames.GetOrAddStringOffset(Name);
19471943
}

llvm/include/llvm/TableGen/StringToOffsetTable.h

Lines changed: 51 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,12 @@ class StringToOffsetTable {
2727
std::string AggregateString;
2828

2929
public:
30+
StringToOffsetTable() {
31+
// Ensure we always put the empty string at offset zero. That lets empty
32+
// initialization also be zero initialization for offsets into the table.
33+
GetOrAddStringOffset("");
34+
}
35+
3036
bool empty() const { return StringOffset.empty(); }
3137
size_t size() const { return AggregateString.size(); }
3238

@@ -51,28 +57,62 @@ class StringToOffsetTable {
5157
return II->second;
5258
}
5359

54-
// Emit the string using string literal concatenation, for better readability
55-
// and searchability.
56-
void EmitStringLiteralDef(raw_ostream &OS, const Twine &Decl,
57-
const Twine &Indent = " ") const {
60+
// Emit a string table definition with the provided name and indent.
61+
//
62+
// When possible, this uses string-literal concatenation to emit the string
63+
// contents in a readable and searchable way. However, for (very) large string
64+
// tables MSVC cannot reliably use string literals and so there we use a large
65+
// character array. We still use a line oriented emission and add comments to
66+
// provide searchability even in this case.
67+
//
68+
// The string table, and its input string contents, are always emitted as both
69+
// `static` and `constexpr`. Both `Name` and (`Name` + "Storage") must be
70+
// valid identifiers to declare.
71+
void EmitStringTableDef(raw_ostream &OS, const Twine &Name,
72+
const Twine &Indent = "") const {
5873
OS << formatv(R"(
5974
#ifdef __GNUC__
6075
#pragma GCC diagnostic push
6176
#pragma GCC diagnostic ignored "-Woverlength-strings"
6277
#endif
63-
{0}{1} = )",
64-
Indent, Decl);
78+
{0}static constexpr char {1}Storage[] = )",
79+
Indent, Name);
6580

81+
// MSVC silently miscompiles string literals longer than 64k in some
82+
// circumstances. When the string table is longer, emit it as an array of
83+
// character literals.
84+
bool UseChars = AggregateString.size() > (64 * 1024);
85+
OS << (UseChars ? "{\n" : "\n");
86+
87+
llvm::ListSeparator LineSep(UseChars ? ",\n" : "\n");
6688
for (StringRef Str : split(AggregateString, '\0')) {
67-
OS << "\n" << Indent << " \"";
68-
OS.write_escaped(Str);
69-
OS << "\\0\"";
89+
OS << LineSep << Indent << " ";
90+
// If we can, just emit this as a string literal to be concatenated.
91+
if (!UseChars) {
92+
OS << "\"";
93+
OS.write_escaped(Str);
94+
OS << "\\0\"";
95+
continue;
96+
}
97+
98+
llvm::ListSeparator CharSep(", ");
99+
for (char C : Str) {
100+
OS << CharSep << "'";
101+
OS.write_escaped(StringRef(&C, 1));
102+
OS << "'";
103+
}
104+
OS << CharSep << "'\\0'";
70105
}
71-
OS << R"(;
106+
OS << LineSep << Indent << (UseChars ? "};" : " ;");
107+
108+
OS << formatv(R"(
72109
#ifdef __GNUC__
73110
#pragma GCC diagnostic pop
74111
#endif
75-
)";
112+
113+
{0}static constexpr llvm::StringTable {1} = {1}Storage;
114+
)",
115+
Indent, Name);
76116
}
77117

78118
// Emit the string as one single string.
@@ -110,26 +150,6 @@ class StringToOffsetTable {
110150
}
111151
O << "\"";
112152
}
113-
114-
/// Emit the string using character literals. MSVC has a limitation that
115-
/// string literals cannot be longer than 64K.
116-
void EmitCharArray(raw_ostream &O) {
117-
assert(AggregateString.find(')') == std::string::npos &&
118-
"can't emit raw string with closing parens");
119-
int Count = 0;
120-
O << ' ';
121-
for (char C : AggregateString) {
122-
O << " \'";
123-
O.write_escaped(StringRef(&C, 1));
124-
O << "\',";
125-
Count++;
126-
if (Count > 14) {
127-
O << "\n ";
128-
Count = 0;
129-
}
130-
}
131-
O << '\n';
132-
}
133153
};
134154

135155
} // end namespace llvm

llvm/lib/IR/Intrinsics.cpp

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212

1313
#include "llvm/IR/Intrinsics.h"
1414
#include "llvm/ADT/StringExtras.h"
15+
#include "llvm/ADT/StringTable.h"
1516
#include "llvm/IR/Function.h"
1617
#include "llvm/IR/IntrinsicsAArch64.h"
1718
#include "llvm/IR/IntrinsicsAMDGPU.h"
@@ -40,7 +41,7 @@ using namespace llvm;
4041

4142
StringRef Intrinsic::getBaseName(ID id) {
4243
assert(id < num_intrinsics && "Invalid intrinsic ID!");
43-
return IntrinsicNameTable + IntrinsicNameOffsetTable[id];
44+
return IntrinsicNameTable[IntrinsicNameOffsetTable[id]];
4445
}
4546

4647
StringRef Intrinsic::getName(ID id) {
@@ -649,20 +650,20 @@ static int lookupLLVMIntrinsicByName(ArrayRef<unsigned> NameOffsetTable,
649650
// `equal_range` requires the comparison to work with either side being an
650651
// offset or the value. Detect which kind each side is to set up the
651652
// compared strings.
652-
const char *LHSStr;
653+
StringRef LHSStr;
653654
if constexpr (std::is_integral_v<decltype(LHS)>) {
654-
LHSStr = &IntrinsicNameTable[LHS];
655+
LHSStr = IntrinsicNameTable[LHS];
655656
} else {
656657
LHSStr = LHS;
657658
}
658-
const char *RHSStr;
659+
StringRef RHSStr;
659660
if constexpr (std::is_integral_v<decltype(RHS)>) {
660-
RHSStr = &IntrinsicNameTable[RHS];
661+
RHSStr = IntrinsicNameTable[RHS];
661662
} else {
662663
RHSStr = RHS;
663664
}
664-
return strncmp(LHSStr + CmpStart, RHSStr + CmpStart, CmpEnd - CmpStart) <
665-
0;
665+
return strncmp(LHSStr.data() + CmpStart, RHSStr.data() + CmpStart,
666+
CmpEnd - CmpStart) < 0;
666667
};
667668
LastLow = Low;
668669
std::tie(Low, High) = std::equal_range(Low, High, Name.data(), Cmp);
@@ -672,7 +673,7 @@ static int lookupLLVMIntrinsicByName(ArrayRef<unsigned> NameOffsetTable,
672673

673674
if (LastLow == NameOffsetTable.end())
674675
return -1;
675-
StringRef NameFound = &IntrinsicNameTable[*LastLow];
676+
StringRef NameFound = IntrinsicNameTable[*LastLow];
676677
if (Name == NameFound ||
677678
(Name.starts_with(NameFound) && Name[NameFound.size()] == '.'))
678679
return LastLow - NameOffsetTable.begin();
@@ -716,7 +717,7 @@ Intrinsic::ID Intrinsic::lookupIntrinsicID(StringRef Name) {
716717

717718
// If the intrinsic is not overloaded, require an exact match. If it is
718719
// overloaded, require either exact or prefix match.
719-
const auto MatchSize = strlen(&IntrinsicNameTable[NameOffsetTable[Idx]]);
720+
const auto MatchSize = IntrinsicNameTable[NameOffsetTable[Idx]].size();
720721
assert(Name.size() >= MatchSize && "Expected either exact or prefix match");
721722
bool IsExactMatch = Name.size() == MatchSize;
722723
return IsExactMatch || Intrinsic::isOverloaded(ID) ? ID

llvm/test/TableGen/MixedCasedMnemonic.td

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ def :MnemonicAlias<"InstB", "BInst">;
4141

4242
// Check that the matcher lower()s the mnemonics it matches.
4343
// MATCHER: static const char MnemonicTable[] =
44-
// MATCHER-NEXT: "\005ainst\005binst";
44+
// MATCHER-NEXT: "\000\005ainst\005binst";
4545

4646
// Check that aInst appears before BInst in the match table.
4747
// This shows that the mnemonics are sorted in a case-insensitive way,

llvm/utils/TableGen/Basic/IntrinsicEmitter.cpp

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -252,8 +252,7 @@ void IntrinsicEmitter::EmitIntrinsicToNameTable(
252252
253253
)";
254254

255-
Table.EmitStringLiteralDef(OS, "static constexpr char IntrinsicNameTable[]",
256-
/*Indent=*/"");
255+
Table.EmitStringTableDef(OS, "IntrinsicNameTable", /*Indent=*/"");
257256

258257
OS << R"(
259258
static constexpr unsigned IntrinsicNameOffsetTable[] = {
@@ -759,13 +758,13 @@ Intrinsic::getIntrinsicFor{}Builtin(StringRef TargetPrefix,
759758
}
760759

761760
if (!Table.empty()) {
762-
Table.EmitStringLiteralDef(OS, "static constexpr char BuiltinNames[]");
761+
Table.EmitStringTableDef(OS, "BuiltinNames");
763762

764763
OS << R"(
765764
struct BuiltinEntry {
766765
ID IntrinsicID;
767766
unsigned StrTabOffset;
768-
const char *getName() const { return &BuiltinNames[StrTabOffset]; }
767+
const char *getName() const { return BuiltinNames[StrTabOffset].data(); }
769768
bool operator<(StringRef RHS) const {
770769
return strncmp(getName(), RHS.data(), RHS.size()) < 0;
771770
}

llvm/utils/TableGen/OptionParserEmitter.cpp

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -287,10 +287,6 @@ static void emitOptionParser(const RecordKeeper &Records, raw_ostream &OS) {
287287
array_pod_sort(PrefixesUnion.begin(), PrefixesUnion.end());
288288

289289
llvm::StringToOffsetTable Table;
290-
// Make sure the empty string is the zero-th one in the table. This both makes
291-
// it easy to check for empty strings (zero offset == empty) and makes
292-
// initialization cheaper for empty strings.
293-
Table.GetOrAddStringOffset("");
294290
// We can add all the prefixes via the union.
295291
for (const auto &Prefix : PrefixesUnion)
296292
Table.GetOrAddStringOffset(Prefix);
@@ -303,9 +299,7 @@ static void emitOptionParser(const RecordKeeper &Records, raw_ostream &OS) {
303299
OS << "/////////\n";
304300
OS << "// String table\n\n";
305301
OS << "#ifdef OPTTABLE_STR_TABLE_CODE\n";
306-
Table.EmitStringLiteralDef(
307-
OS, "static constexpr llvm::StringTable OptionStrTable",
308-
/*Indent=*/"");
302+
Table.EmitStringTableDef(OS, "OptionStrTable", /*Indent=*/"");
309303
OS << "#endif // OPTTABLE_STR_TABLE_CODE\n\n";
310304

311305
// Dump prefixes.

0 commit comments

Comments
 (0)