Skip to content

Conversation

@VitaNuo
Copy link
Contributor

@VitaNuo VitaNuo commented Oct 7, 2024

…led names produced by clang can be demangled by LLVM demangler.

Introduce the above diagnostic behind the -fno-demangling-failures flag to prevent unintended breakages.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:codegen IR generation bugs: mangling, exceptions, etc. labels Oct 7, 2024
@llvmbot
Copy link
Member

llvmbot commented Oct 7, 2024

@llvm/pr-subscribers-clang-driver
@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-clang

Author: Viktoriia Bakalova (VitaNuo)

Changes

…led names produced by clang can be demangled by LLVM demangler.

Introduce the above assertion behind the -fno-demangling-failures flag to prevent unintended breakages.


Full diff: https://github.com/llvm/llvm-project/pull/111391.diff

6 Files Affected:

  • (modified) clang/include/clang/Basic/CodeGenOptions.def (+3)
  • (modified) clang/include/clang/Driver/Options.td (+4)
  • (modified) clang/lib/CodeGen/CodeGenModule.cpp (+7)
  • (added) clang/test/CodeGenCXX/assert-demangle.cpp (+14)
  • (modified) llvm/include/llvm/Demangle/Demangle.h (+7)
  • (modified) llvm/lib/Demangle/Demangle.cpp (+10)
diff --git a/clang/include/clang/Basic/CodeGenOptions.def b/clang/include/clang/Basic/CodeGenOptions.def
index eac831278ee20d..d3e150fe53f804 100644
--- a/clang/include/clang/Basic/CodeGenOptions.def
+++ b/clang/include/clang/Basic/CodeGenOptions.def
@@ -462,6 +462,9 @@ ENUM_CODEGENOPT(ZeroCallUsedRegs, llvm::ZeroCallUsedRegs::ZeroCallUsedRegsKind,
 /// non-deleting destructors. (No effect on Microsoft ABI.)
 CODEGENOPT(CtorDtorReturnThis, 1, 0)
 
+/// Whether to validate if a produced mangled name can be demangled with LLVM demangler.
+CODEGENOPT(NoDemanglingFailures, 1, 0)
+
 /// FIXME: Make DebugOptions its own top-level .def file.
 #include "DebugOptions.def"
 
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index 607ff47a857b8f..a6a8cfe21676c0 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -1967,6 +1967,10 @@ def fclang_abi_compat_EQ : Joined<["-"], "fclang-abi-compat=">, Group<f_clang_Gr
   Visibility<[ClangOption, CC1Option]>,
   MetaVarName<"<version>">, Values<"<major>.<minor>,latest">,
   HelpText<"Attempt to match the ABI of Clang <version>">;
+def fno_demangling_failures: Flag<["-"], "fno-demangling-failures">, Group<f_clang_Group>,
+  Visibility<[ClangOption, CC1Option]>,
+  HelpText<"Assert that clang can demangle all the mangled names it generates">,
+  MarshallingInfoFlag<CodeGenOpts<"NoDemanglingFailures">>;
 def fclasspath_EQ : Joined<["-"], "fclasspath=">, Group<f_Group>;
 def fcolor_diagnostics : Flag<["-"], "fcolor-diagnostics">, Group<f_Group>,
 
diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp
index 25c1c496a4f27f..6afbfa9ae45ed9 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -48,12 +48,14 @@
 #include "clang/Basic/Version.h"
 #include "clang/CodeGen/BackendUtil.h"
 #include "clang/CodeGen/ConstantInitBuilder.h"
+#include "clang/Driver/Driver.h"
 #include "clang/Frontend/FrontendDiagnostic.h"
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/StringExtras.h"
 #include "llvm/ADT/StringSwitch.h"
 #include "llvm/Analysis/TargetLibraryInfo.h"
 #include "llvm/BinaryFormat/ELF.h"
+#include "llvm/Demangle/Demangle.h"
 #include "llvm/Frontend/OpenMP/OMPIRBuilder.h"
 #include "llvm/IR/AttributeMask.h"
 #include "llvm/IR/CallingConv.h"
@@ -75,6 +77,7 @@
 #include "llvm/TargetParser/Triple.h"
 #include "llvm/TargetParser/X86TargetParser.h"
 #include "llvm/Transforms/Utils/BuildLibCalls.h"
+#include <cassert>
 #include <optional>
 
 using namespace clang;
@@ -2044,6 +2047,10 @@ StringRef CodeGenModule::getMangledName(GlobalDecl GD) {
                  GD.getWithKernelReferenceKind(KernelReferenceKind::Kernel),
                  ND));
 
+  if (getCodeGenOpts().NoDemanglingFailures) 
+    assert((!llvm::isMangledName(MangledName) || llvm::demangle(MangledName) !=
+           MangledName) && "clang must demangle a mangled name it generates!");
+  
   auto Result = Manglings.insert(std::make_pair(MangledName, GD));
   return MangledDeclNames[CanonicalGD] = Result.first->first();
 }
diff --git a/clang/test/CodeGenCXX/assert-demangle.cpp b/clang/test/CodeGenCXX/assert-demangle.cpp
new file mode 100644
index 00000000000000..86a7686f72d929
--- /dev/null
+++ b/clang/test/CodeGenCXX/assert-demangle.cpp
@@ -0,0 +1,14 @@
+// RUN: %clang_cc1 -emit-llvm -fno-demangling-failures -triple %itanium_abi_triple -o - %s | FileCheck %s
+
+// CHECK: @_ZN6foobar3barEv
+// CHECK: @_ZN6foobar1A3fooEi
+namespace foobar {
+struct A {
+  void foo (int) {
+  }
+};
+
+void bar() {
+  A().foo(0);
+}
+} // namespace foobar
diff --git a/llvm/include/llvm/Demangle/Demangle.h b/llvm/include/llvm/Demangle/Demangle.h
index fe129603c0785d..910f0410d62db3 100644
--- a/llvm/include/llvm/Demangle/Demangle.h
+++ b/llvm/include/llvm/Demangle/Demangle.h
@@ -67,6 +67,13 @@ char *dlangDemangle(std::string_view MangledName);
 /// demangling occurred.
 std::string demangle(std::string_view MangledName);
 
+/// Determines if the argument string is a valid mangled name known to the
+/// demangler.
+/// \param Name - reference to a string that is potentially a mangled name.
+/// \returns - true if the argument represents a valid mangled name, false
+/// otherwise.
+bool isMangledName(std::string_view Name);
+
 bool nonMicrosoftDemangle(std::string_view MangledName, std::string &Result,
                           bool CanHaveLeadingDot = true,
                           bool ParseParams = true);
diff --git a/llvm/lib/Demangle/Demangle.cpp b/llvm/lib/Demangle/Demangle.cpp
index f0f7eacac98e64..6b40cbb56cf28d 100644
--- a/llvm/lib/Demangle/Demangle.cpp
+++ b/llvm/lib/Demangle/Demangle.cpp
@@ -47,6 +47,16 @@ static bool isRustEncoding(std::string_view S) { return starts_with(S, "_R"); }
 
 static bool isDLangEncoding(std::string_view S) { return starts_with(S, "_D"); }
 
+static bool isMicrosoftEncoding(std::string_view S) {
+  return starts_with(S, '?');
+}
+
+bool llvm::isMangledName(std::string_view Name) {
+  return starts_with(Name, '.') || isItaniumEncoding(Name) ||
+         isRustEncoding(Name) || isDLangEncoding(Name) ||
+         isMicrosoftEncoding(Name);
+}
+
 bool llvm::nonMicrosoftDemangle(std::string_view MangledName,
                                 std::string &Result, bool CanHaveLeadingDot,
                                 bool ParseParams) {

@github-actions
Copy link

github-actions bot commented Oct 7, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Collaborator

@erichkeane erichkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is being done as an assert, this is something that really only applies during debug mode. I dont think this compiler flag makes sense because of it.

IMO, this should either be a preprocessor macro to enable (so it is a build config), or a diagnostic that happens all the time (perhaps whose default changes?).

@VitaNuo
Copy link
Contributor Author

VitaNuo commented Oct 8, 2024

Since this is being done as an assert, this is something that really only applies during debug mode. I dont think this compiler flag makes sense because of it.

The reason I am adding a flag on top of an assert is that there are unfortunately still many mangled names produced by clang that cannot be demangled. E.g., many of the existing codegen lit tests (clang/test/CodeGenCXX/*) fail in assertion-enabled builds without the flag. This is potentially disruptive to development workflows. The flag can be removed once the LLVM demangler can handle all the mangled names clang produces.

@ilya-biryukov
Copy link
Contributor

ilya-biryukov commented Oct 8, 2024

Sorry for jumping in late.

I think Erich is on point that having a flag that controls an assertion is a bit of a red flag as we are mixing build configuration and runtime configuration.

It is at least unusual and may cause confusion.

After thinking about this a bit more, should we maybe go all-in on one of the approaches?

  • either use build configuration: add a new build flag that controls this assertion. Only assert when assertions are enabled and the new build flag is defined.
  • or use runtime configuration always, e.g. add diagnostics for names that can't be demangled. It should make finding those issues much easier (one can run the compiler to produce a list of names that can't be demangled with locations pointing at code that helps to identify where those names come from).

I would probably vouch for the second approach. The only downside that I see is that we have to expose the warning flag to users (right?) and this is something that should be internal (-cc1).

What do people think of that proposal?

@erichkeane
Copy link
Collaborator

Sorry for jumping in late.

I think Erich is on point that having a flag that controls an assertion is a bit of a red flag as we are mixing build configuration and runtime configuration.

It is at least unusual and may cause confusion.

After thinking about this a bit more, should we maybe go all-in on one of the approaches?

* either use build configuration: add a new build flag that controls this assertion. Only assert when assertions are enabled and the new build flag is defined.

* or use runtime configuration always, e.g. add diagnostics for names that can't be demangled. It should make finding those issues much easier (one can run the compiler to produce a list of names that can't be demangled with locations pointing at code that helps to identify where those names come from).

I would probably vouch for the second approach. The only downside that I see is that we have to expose the warning flag to users (right?) and this is something that should be internal (-cc1).

What do people think of that proposal?

I agree completely. I have a preference for the second thing as well. We can have this be a warning-as-default-error type thing, which allows us to disable this with a flag, but is still an error. I MIGHT suggest using the functionality to see if that diagnostic is on before doing the test (due to how much additional work there is here), but IMO, that is a much better way forward.

@ilya-biryukov
Copy link
Contributor

ilya-biryukov commented Oct 10, 2024

I agree completely. I have a preference for the second thing as well. We can have this be a warning-as-default-error type thing, which allows us to disable this with a flag, but is still an error. I MIGHT suggest using the functionality to see if that diagnostic is on before doing the test (due to how much additional work there is here), but IMO, that is a much better way forward.

👍 I guess that's a good signal that this is the way to go. AFAIK, @VitaNuo has already started exploring this direction.
Wrt to enabling this warning, I think we should probably start by having this warning disabled because there seems to be too many failures, but we could gradually roll it out:

  • start with a warning that's disabled by default,
  • collect issues that we hit in LLVM tests (there's plenty AFAIU) and also from real projects (we can run with the warning internally over Google's codebase to get a representative sample),
  • fix all demangling issues present in the tests and the most common ones from other projects,
  • enable the flag by default.

@erichkeane how do you feel about having it enabled given that this does not really block users, but merely points at demangling issues in LLVM's demangler (that they don't necessarily even use). It feels like this should be internal, but then we won't get any bug reports. I feel like we should at least fix the majority of the issues before enabling it (hence the rough roadmap above).

@erichkeane
Copy link
Collaborator

I agree completely. I have a preference for the second thing as well. We can have this be a warning-as-default-error type thing, which allows us to disable this with a flag, but is still an error. I MIGHT suggest using the functionality to see if that diagnostic is on before doing the test (due to how much additional work there is here), but IMO, that is a much better way forward.

👍 I guess that's a good signal that this is the way to go. AFAIK, @VitaNuo has already started exploring this direction. Wrt to enabling this warning, I think we should probably start by having this warning disabled because there seems to be too many failures, but we could gradually roll it out:

* start with a warning that's disabled by default,

* collect issues that we hit in LLVM tests (there's plenty AFAIU) and also from real projects (we can run with the warning internally over Google's codebase to get a representative sample),

* fix all demangling issues present in the tests and the most common ones from other projects,

* enable the flag by default.

@erichkeane how do you feel about having it enabled given that this does not really block users, but merely points at demangling issues in LLVM's demangler (that they don't necessarily even use). It feels like this should be internal, but then we won't get any bug reports. I feel like we should at least fix the majority of the issues before enabling it (hence the rough roadmap above).

I definitely have some heartburn about the enabling-of-flag by default, I would expect these diagnostics to be something actionable by our users, but I also definitely see the wish to get as many people testing this to improve the llvm demangler as possible. I pinged a few others to comment here, but I'd have to be reasonably sure this is a 'no one will ever see this' kind of thing, and if they DO, have a good idea of what to do.

I'm on the fence, but if we were to, I have some thoughts:
1- Can we test this on the top X github C++ projects as well? Perhaps at LEAST Boost.
2- Can we ensure that anyone who searches for the diagnostic has a way of VERY QUICKLY seeing how to report it? So it has to be VERY well documented.
3- Have the diagnostic make it clear that this isn't a "something wrong with your code, but with our demangler"?
4- Instead of being a warning, how about making it a 'remark'?

@VitaNuo
Copy link
Contributor Author

VitaNuo commented Oct 10, 2024

Totally agree to 2 to 4.

Regarding the testing and => default enablement, I think we'll have to first see how many failures we'll see on LLVM + Google internal codebase. It's hard to tell at this point how many of those there will be, and so how feasible it is (or how soon it is feasible) to enable the warning by default.

@AaronBallman
Copy link
Collaborator

I agree completely. I have a preference for the second thing as well. We can have this be a warning-as-default-error type thing, which allows us to disable this with a flag, but is still an error. I MIGHT suggest using the functionality to see if that diagnostic is on before doing the test (due to how much additional work there is here), but IMO, that is a much better way forward.

👍 I guess that's a good signal that this is the way to go. AFAIK, @VitaNuo has already started exploring this direction. Wrt to enabling this warning, I think we should probably start by having this warning disabled because there seems to be too many failures, but we could gradually roll it out:

* start with a warning that's disabled by default,

* collect issues that we hit in LLVM tests (there's plenty AFAIU) and also from real projects (we can run with the warning internally over Google's codebase to get a representative sample),

* fix all demangling issues present in the tests and the most common ones from other projects,

* enable the flag by default.

@erichkeane how do you feel about having it enabled given that this does not really block users, but merely points at demangling issues in LLVM's demangler (that they don't necessarily even use). It feels like this should be internal, but then we won't get any bug reports. I feel like we should at least fix the majority of the issues before enabling it (hence the rough roadmap above).

I am not super comfortable with enabling the diagnostic by default because it's not something the user can do anything about aside from disable the diagnostic (which means they'll report one issue and probably never report another again because they disabled the warning due to chattiness) and I don't think we want to train users to ignore diagnostics or force them to turn off low-value ones. However, what about this as a compromise: we enable the diagnostic without a diagnostic group (so it cannot be disabled) in all builds up to the release candidate, and then we fully disable it for the actual releases? This way, early adopters can help us find issues, but we're not inflicting any pain on the general public once we release.

(FWIW, I think this is something that should have an RFC for wider community buy-in given the effects.)

…led names produced by `clang` can be demangled by LLVM demangler.

Introduce the above warning behind the `-fdemangling-failures` flag, since:

a. the diagnostic cannot be fixed by an application developer.
b. the diagnostic is expected to be quite chatty.
@ilya-biryukov
Copy link
Contributor

I am not super comfortable with enabling the diagnostic by default because it's not something the user can do anything about aside from disable the diagnostic (which means they'll report one issue and probably never report another again because they disabled the warning due to chattiness) and I don't think we want to train users to ignore diagnostics or force them to turn off low-value ones. However, what about this as a compromise: we enable the diagnostic without a diagnostic group (so it cannot be disabled) in all builds up to the release candidate, and then we fully disable it for the actual releases? This way, early adopters can help us find issues, but we're not inflicting any pain on the general public once we release.

(FWIW, I think this is something that should have an RFC for wider community buy-in given the effects.)

We would need a way to enable or disable this diagnostic internally, ideally via a runtime switch, so that we don't need to ship a new compiler every time a user hits this warning. I think the same also holds for users that use RC: we want them to report the bug, but they also need to make progress on their testing of RC/whatever else they're doing with it.
The only downside of having a diagnostic group that I see is that it's public and shows up in documentation everywhere. However, given the amount of warnings we have, maybe that's not even a big deal.

I think the way out of it is to enable this diagnostic only after we make sure it is not chatty. So we do a gradual rollout:

  • add a warning, disabled by default,
  • collect cases where we cannot demangle in important projects by enabling the diagnostic for a single run,
  • fix those problems,
  • enable the warning by default up to RC builds as proposed,
  • collect and fix any problems that happen from RC,
  • enable the warning by default in release because it's not chatty anymore.
  • (if the demangler is really stable) remove the group so that the diagnostic cannot be disabled.

Hopefully by the last step, we will be discovering almost all names that we cannot demangle during Clang development and users will almost never see it.
It it's rare, I hope this would motivate the users to disable it less (although some still will). If we really want to fight that, we can actually remove the group and force it to be enabled by default.

+1 to having this discussion in the RFC

@VitaNuo
Copy link
Contributor Author

VitaNuo commented Oct 15, 2024

I am not super comfortable with enabling the diagnostic by default because it's not something the user can do anything about aside from disable the diagnostic (which means they'll report one issue and probably never report another again because they disabled the warning due to chattiness) and I don't think we want to train users to ignore diagnostics or force them to turn off low-value ones. However, what about this as a compromise: we enable the diagnostic without a diagnostic group (so it cannot be disabled) in all builds up to the release candidate, and then we fully disable it for the actual releases? This way, early adopters can help us find issues, but we're not inflicting any pain on the general public once we release.

(FWIW, I think this is something that should have an RFC for wider community buy-in given the effects.)

Sounds good, I'll try to summarize this discussion as an RFC.

collect issues that we hit in LLVM tests (there's plenty AFAIU) and also from real projects (we can run with the warning internally over Google's codebase to get a representative sample).

It might be a good idea to do this first, in order to collect more data for an RFC and the corresponding discussion.

@VitaNuo VitaNuo changed the title [clang][LLVM Demangler] Add an assertion that validates that all mang… [clang][LLVM Demangler] Add a diagnostic that validates that all mang… Oct 24, 2024
@llvmbot llvmbot added the clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' label Oct 24, 2024
Copy link
Collaborator

@erichkeane erichkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Close for me, jsut a couple of small requests.

Thanks for your patience on this response, I've been at LLVMDev all week.

CODEGENOPT(CtorDtorReturnThis, 1, 0)

/// Whether to issue a diagnostic if a produced mangled name can not be demangled with the LLVM demangler.
CODEGENOPT(DemanglingFailures, 1, 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
CODEGENOPT(DemanglingFailures, 1, 0)
CODEGENOPT(DiagDemanglerFailures, 1, 0)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, mentioning diagnostics makes sense.


def warn_name_cannot_be_demangled : Warning<
"cannot demangle the name '%0'">,
InGroup<CXX20Compat>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Group is wrong here, we likely want to give it its own group for the diagnostic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a new Demangler group.

Visibility<[ClangOption, CC1Option]>,
MetaVarName<"<version>">, Values<"<major>.<minor>,latest">,
HelpText<"Attempt to match the ABI of Clang <version>">;
def fdemangling_failures: Flag<["-"], "fdemangling-failures">, Group<f_clang_Group>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def fdemangling_failures: Flag<["-"], "fdemangling-failures">, Group<f_clang_Group>,
def fdiag-demangler_failures: Flag<["-"], "fdiag-demangler-failures">, Group<f_clang_Group>,

Same as above... I'm not attached to the demangling vs demangler, but mentioning explicitly that this is for 'diag' everywhere is necessary for readability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, good point. I've looked through the existing flags, there are multiple ones that start with fdiagnostics_XXX, so I have renamed the flag to fdiagnostics_demangler_failures.
Let me know if this works.

… controls the demangler diagnostic. Reduce the new include count in CodeGenModule.cpp. Add a new diagnostic group for demangler failures.
@VitaNuo
Copy link
Contributor Author

VitaNuo commented Oct 28, 2024

Close for me, jsut a couple of small requests.

Thanks for your patience on this response, I've been at LLVMDev all week.

Thanks Erich for the approval.
I will publish an RFC for this in the coming days as well, still gathering some data from test runs.

@VitaNuo
Copy link
Contributor Author

VitaNuo commented Oct 30, 2024

@erichkeane
Copy link
Collaborator

The RFC is published https://discourse.llvm.org/t/rfc-clang-diagnostic-for-demangling-failures/82835.

Thanks! I thought i commented elsewhere, but I think diagnose vs diagnostics for the flags/etc makes more sense:
fdiagnostics_demangler_failures should be -fdiagnose_demangler_failures, it is the correct language-tense.

Else, the patch is good delta what happens with the RFC. Please make sure to wait until Aaron calls the RFC.

@VitaNuo
Copy link
Contributor Author

VitaNuo commented Oct 31, 2024

Thanks! I thought i commented elsewhere, but I think diagnose vs diagnostics for the flags/etc makes more sense: fdiagnostics_demangler_failures should be -fdiagnose_demangler_failures, it is the correct language-tense.

Thanks! I think it was demangling vs. demangler, and now it's diagnostics vs. diagnose :) I'll incorporate that.

Else, the patch is good delta what happens with the RFC. Please make sure to wait until Aaron calls the RFC.

Ofc. I am also OOO next week.

@VitaNuo
Copy link
Contributor Author

VitaNuo commented Nov 22, 2024

After some internal discussions, we have decided to put this work on hold.
It still remains an important problem. I have posted some stats for various open source projects in the RFC. Internally, we have around 177k violations (mangled names that can’t be demangled by the LLVM demangler).

In case someone in the community would like to pick up the effort, this PR has the code for the diagnostic. I will also add a comment to the codegen module mentioning the invariant that should hold in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:codegen IR generation bugs: mangling, exceptions, etc. clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[clang][ItaniumMangle] clang should validate that the LLVM demangler can handle all the mangled names it produces.

5 participants