Skip to content

Conversation

@Michael137
Copy link
Member

This is a reduced test case from a crash we've observed in the past. The assertion that this test triggers is:

Assertion failed: ((Pos == ImportedDecls.end() || Pos->second == To) && "Try to import an already imported Decl"), function MapImported, file ASTImporter.cpp, line 10494.

In a non-asserts build we crash later on in the ASTImporter. The root cause is, as the assertion above points out, that we erroneously replace an existing From->To decl mapping with a To decl that isn't complete. Then we try to complete it but it has no definition and we dereference a nullptr.

The reason this happens is basically what's been described in https://reviews.llvm.org/D67803?id=220956#1676588

The dylib contains a definition of Service which is different to the one in the main executable. When we start dumping the children of the variable we're printing, we start completing it's members, ASTImporting fields in the process. When the ASTImporter realizes there's been a name conflict (i.e., a structural mismatch on the Service type) it would usually report back an error. However, LLDB uses ODRHandlingType::Liberal, which means we create a new decl for the ODR'd type instead of re-using the previously mapped decl. Eventually this leads us to crash.

Ideally we'd be using ODRHandlingType::Conservative and warn/error, though LLDB relies on this in some cases (particularly for distinguishing template specializations, though maybe there's better a way to deal with those).

We should really warn the user when this happens and not crash. To avoid the crash we'd need to know to not create a decl for the ODR violation, and instead re-use the definition we've previously seen. Though I'm not yet sure that's viable for all of LLDB's use-cases (where ODR violations might legimiately occur in a program, e.g., with opaque definitions, etc.).

This is a reduced test case from a crash we've observed in the past.
The assertion that this test triggers is:
```
Assertion failed: ((Pos == ImportedDecls.end() || Pos->second == To) && "Try to import an already imported Decl"), function MapImported, file ASTImporter.cpp, line 10494.
```

In a non-asserts build we crash later on in the ASTImporter. The
root cause is, as the assertion above points out, that we erroneously
replace an existing `From->To` decl mapping with a `To` decl that
isn't complete. Then we try to complete it but it has no definition
and we dereference a nullptr.

The reason this happens is basically what's been described in https://reviews.llvm.org/D67803?id=220956#1676588

The dylib contains a definition of `Service` which is different to the
one in the main executable. When we start dumping the children of the
variable we're printing, we start completing it's members, `ASTImport`ing
fields in the process. When the ASTImporter realizes there's been a name
conflict (i.e., a structural mismatch on the `Service` type) it would
usually report back an error. However, LLDB uses `ODRHandlingType::Liberal`,
which means we create a new decl for the ODR'd type instead of re-using
the previously mapped decl. Eventually this leads us to crash.

Ideally we'd be using `ODRHandlingType::Conservative` and warn/error, though
LLDB relies on this in some cases (particularly for distinguishing template
specializations, though maybe there's better a way to deal with those).

We should really warn the user when this happens and not crash. To avoid
the crash we'd need to know to not create a decl for the ODR violation,
and instead re-use the definition we've previously seen. Though I'm not
yet sure that's viable for all of LLDB's use-cases (where ODR violations
might legimiately occur in a program, e.g., with opaque definitions,
etc.).
@llvmbot
Copy link
Member

llvmbot commented Oct 16, 2024

@llvm/pr-subscribers-lldb

Author: Michael Buch (Michael137)

Changes

This is a reduced test case from a crash we've observed in the past. The assertion that this test triggers is:

Assertion failed: ((Pos == ImportedDecls.end() || Pos->second == To) && "Try to import an already imported Decl"), function MapImported, file ASTImporter.cpp, line 10494.

In a non-asserts build we crash later on in the ASTImporter. The root cause is, as the assertion above points out, that we erroneously replace an existing From->To decl mapping with a To decl that isn't complete. Then we try to complete it but it has no definition and we dereference a nullptr.

The reason this happens is basically what's been described in https://reviews.llvm.org/D67803?id=220956#1676588

The dylib contains a definition of Service which is different to the one in the main executable. When we start dumping the children of the variable we're printing, we start completing it's members, ASTImporting fields in the process. When the ASTImporter realizes there's been a name conflict (i.e., a structural mismatch on the Service type) it would usually report back an error. However, LLDB uses ODRHandlingType::Liberal, which means we create a new decl for the ODR'd type instead of re-using the previously mapped decl. Eventually this leads us to crash.

Ideally we'd be using ODRHandlingType::Conservative and warn/error, though LLDB relies on this in some cases (particularly for distinguishing template specializations, though maybe there's better a way to deal with those).

We should really warn the user when this happens and not crash. To avoid the crash we'd need to know to not create a decl for the ODR violation, and instead re-use the definition we've previously seen. Though I'm not yet sure that's viable for all of LLDB's use-cases (where ODR violations might legimiately occur in a program, e.g., with opaque definitions, etc.).


Full diff: https://github.com/llvm/llvm-project/pull/112566.diff

6 Files Affected:

  • (added) lldb/test/Shell/Expr/Inputs/name-conflict-test/main.cpp (+21)
  • (added) lldb/test/Shell/Expr/Inputs/name-conflict-test/plugin.cpp (+15)
  • (added) lldb/test/Shell/Expr/Inputs/name-conflict-test/plugin.h (+10)
  • (added) lldb/test/Shell/Expr/Inputs/name-conflict-test/service.cpp (+15)
  • (added) lldb/test/Shell/Expr/Inputs/name-conflict-test/service.h (+20)
  • (added) lldb/test/Shell/Expr/TestODRHandlingWithDylib.test (+17)
diff --git a/lldb/test/Shell/Expr/Inputs/name-conflict-test/main.cpp b/lldb/test/Shell/Expr/Inputs/name-conflict-test/main.cpp
new file mode 100644
index 00000000000000..74d40d04ed4e20
--- /dev/null
+++ b/lldb/test/Shell/Expr/Inputs/name-conflict-test/main.cpp
@@ -0,0 +1,21 @@
+#include "service.h"
+#include <cassert>
+#include <dlfcn.h>
+
+#ifndef PLUGIN_PATH
+#error "Expected PLUGIN_PATH to be defined"
+#endif // !PLUGIN_PATH
+
+int main() {
+  void *handle = dlopen(PLUGIN_PATH, RTLD_NOW);
+  assert(handle != nullptr);
+  void (*plugin_init)(void) = (void (*)(void))dlsym(handle, "plugin_init");
+  assert(plugin_init != nullptr);
+  void (*plugin_entry)(void) = (void (*)(void))dlsym(handle, "plugin_entry");
+  assert(plugin_entry != nullptr);
+
+  exported();
+  plugin_init();
+  plugin_entry();
+  return 0;
+}
diff --git a/lldb/test/Shell/Expr/Inputs/name-conflict-test/plugin.cpp b/lldb/test/Shell/Expr/Inputs/name-conflict-test/plugin.cpp
new file mode 100644
index 00000000000000..8f8e98ed5c4b01
--- /dev/null
+++ b/lldb/test/Shell/Expr/Inputs/name-conflict-test/plugin.cpp
@@ -0,0 +1,15 @@
+#include "service.h"
+#include "plugin.h"
+
+struct Proxy : public Service {
+  State *proxyState;
+};
+
+Proxy *gProxyThis = 0;
+
+extern "C" {
+void plugin_init() { gProxyThis = new Proxy; }
+
+void plugin_entry() {}
+}
+
diff --git a/lldb/test/Shell/Expr/Inputs/name-conflict-test/plugin.h b/lldb/test/Shell/Expr/Inputs/name-conflict-test/plugin.h
new file mode 100644
index 00000000000000..a04b0974165e71
--- /dev/null
+++ b/lldb/test/Shell/Expr/Inputs/name-conflict-test/plugin.h
@@ -0,0 +1,10 @@
+#ifndef PLUGIN_H_IN
+#define PLUGIN_H_IN
+
+extern "C" {
+void plugin_entry(void);
+void plugin_init(void);
+}
+
+#endif // _H_IN
+
diff --git a/lldb/test/Shell/Expr/Inputs/name-conflict-test/service.cpp b/lldb/test/Shell/Expr/Inputs/name-conflict-test/service.cpp
new file mode 100644
index 00000000000000..5fd008c2aa8b80
--- /dev/null
+++ b/lldb/test/Shell/Expr/Inputs/name-conflict-test/service.cpp
@@ -0,0 +1,15 @@
+#include "service.h"
+
+struct ServiceAux {
+  Service *Owner;
+};
+
+struct Service::State {};
+
+void exported() {
+  // Make sure debug-info for definition of Service is
+  // emitted in this CU.
+  Service service;
+  service.start(0);
+}
+
diff --git a/lldb/test/Shell/Expr/Inputs/name-conflict-test/service.h b/lldb/test/Shell/Expr/Inputs/name-conflict-test/service.h
new file mode 100644
index 00000000000000..53762a6e00357a
--- /dev/null
+++ b/lldb/test/Shell/Expr/Inputs/name-conflict-test/service.h
@@ -0,0 +1,20 @@
+#ifndef SERVICE_H_IN
+#define SERVICE_H_IN
+
+struct ServiceAux;
+
+struct Service {
+  struct State;
+  bool start(State *) { return true; }
+
+#if HIDE_FROM_PLUGIN
+  int __resv1;
+#endif // !HIDE_FROM_PLUGIN
+
+  Service *__owner;
+  ServiceAux *aux;
+};
+
+void exported();
+
+#endif
diff --git a/lldb/test/Shell/Expr/TestODRHandlingWithDylib.test b/lldb/test/Shell/Expr/TestODRHandlingWithDylib.test
new file mode 100644
index 00000000000000..7ee34ff4431d8b
--- /dev/null
+++ b/lldb/test/Shell/Expr/TestODRHandlingWithDylib.test
@@ -0,0 +1,17 @@
+# REQUIRES: system-darwin
+
+# RUN: %clangxx_host %p/Inputs/name-conflict-test/plugin.cpp -shared -gdwarf -O0 -o %t.plugin.dylib
+#
+# RUN: %clangxx_host %p/Inputs/name-conflict-test/main.cpp \
+# RUN:               %p/Inputs/name-conflict-test/service.cpp \
+# RUN:               -DPLUGIN_PATH=\"%t.plugin.dylib\" \
+# RUN:               -DHIDE_FROM_PLUGIN \
+# RUN:               -gdwarf -O0 -o %t.a.out
+#
+# RUN: %lldb %t.a.out \
+# RUN:     -o "b plugin_entry" \
+# RUN:     -o run \
+# RUN:     -o "expr *gProxyThis" \
+# RUN:     -o exit | FileCheck  %s
+
+# CHECK: (lldb) expr *gProxyThis

@Michael137
Copy link
Member Author

Not sure how to best XFAIL this atm (really want to avoid just skipping it). This currently crashes LLDB. Maybe we could make use of the AST dump, or the expression log..

@github-actions
Copy link

github-actions bot commented Oct 16, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Collaborator

@labath labath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is one of those tests that should really be an API test. As far as I can tell, there's nothing darwin-specific about this bug -- except for the fact that building shared libraries in a cross-platform manner is hard. But the API tests have wrappers that let us do that, and they even support windows. (I had a feeling our docs already stated that shared libraries is a reason to make something an API test, but I can't find that now.)

I understand the desire to not skip a test (and that making it an API test will make that even harder, as a crash will bring down the entire test), but I'm not sure if that's so useful as to override this. I also think that making a test which checks that something fails in a particular way can be fragile (as a random refactor could make it fail in a different way).

That said, thank you for reducing this. Having this around somewhere will be very useful for fixing this (eventually). From the looks of things, it may be possible to get rid of the dlopen/dlsym thing as well (replace it with hard shared library dependencies) -- it seems the important part is that the two modules have different definitions of one type, and the way that the shared library is loaded is beside the point.

@Michael137
Copy link
Member Author

I think this is one of those tests that should really be an API test. As far as I can tell, there's nothing darwin-specific about this bug -- except for the fact that building shared libraries in a cross-platform manner is hard. But the API tests have wrappers that let us do that, and they even support windows. (I had a feeling our docs already stated that shared libraries is a reason to make something an API test, but I can't find that now.)

Yea that makes sense. Adding a system-darwin requirement did feel off.

I understand the desire to not skip a test (and that making it an API test will make that even harder, as a crash will bring down the entire test), but I'm not sure if that's so useful as to override this. I also think that making a test which checks that something fails in a particular way can be fragile (as a random refactor could make it fail in a different way).

True, I've seen this be an issue in the past, especially with ASTImporter tests. I was considering adding a warning (or log message at the very least) when we try to overwrite and already mapped decl. We could technically XFAIL on that. But yea that's probably too fragile. Ideally we'd just test that we successfully run the expression.

From the looks of things, it may be possible to get rid of the dlopen/dlsym thing as well (replace it with hard shared library dependencies) -- it seems the important part is that the two modules have different definitions of one type, and the way that the shared library is loaded is beside the point.

Yup, that should work!

@Michael137
Copy link
Member Author

Skipped for now

@github-actions
Copy link

github-actions bot commented Oct 17, 2024

✅ With the latest revision this PR passed the Python code formatter.

Copy link
Collaborator

@labath labath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

struct State;
bool start(State *) { return true; }

#ifdef HIDE_FROM_PLUGIN
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this is really how the original code looked like. This is a particularly nasty form of ODR violation as it messes with the offsets of every field that comes after it (including of the non-plugin Proxy class which inherits from it). It doesn't really matter, as I still think it lldb should be able to handle this kind of a situation, but I'm surprised that code like this could ever work. Could it be that the original code used a milder form of this, like declaring a field with a different name/type (but the same size&alignment)?

Copy link
Member Author

@Michael137 Michael137 Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea unfortunately this is what the internal users are doing. I reached out to them to understand how this is not causing issues for them. They compile with non-standards conforming extensions, so that might be part of the story

Co-authored-by: Pavel Labath <[email protected]>
@Michael137 Michael137 merged commit 3bc765d into llvm:main Oct 18, 2024
7 checks passed
@Michael137 Michael137 deleted the lldb/name-conflict-errors branch October 18, 2024 13:19
Michael137 added a commit that referenced this pull request Oct 19, 2024
…orted invariant (#112748)

This patch emits a warning into the expression log when we call
`MapImported` on a decl which has already been imported, but with a new
`to` destination decl. In asserts builds this would lead to triggering
this [ASTImporter::MapImported
assertion](https://github.com/llvm/llvm-project/blob/6d7712a70c163d2ae9e1dc928db31fcb45d9e404/clang/lib/AST/ASTImporter.cpp#L10493-L10494).
In no-asserts builds we will likely crash, in potentially non-obvious
ways. The hope is that the log message will help in diagnosing this type
of issue in the field.

The underlying issue is discussed in more detail in:
#112566.

In a non-asserts build, the last few expression log entries would look
as follows:
```
     CompleteTagDecl on (ASTContext*)scratch ASTContext Completing (TagDecl*)0x00000001132d31d0 named Foo
       CTD Before:
CXXRecordDecl 0x1132d31d0 <<invalid sloc>> <invalid sloc> <undeserialized declarations> struct Foo

 [ClangASTImporter] WARNING: overwriting an already imported decl '0x000000014378fd80' ('Foo') from '0x0000000143790c00' with 0x00000001132d31d0. Likely due to a name conflict when importing 'Foo'.
     [ClangASTImporter] Imported (FieldDecl*)0x0000000143790220, named service (from (Decl*)0x0000000143791270), metadata 271
     [ClangASTImporter] Decl has no origin information in (ASTContext*)0x00000001132c8c00
 FindExternalLexicalDecls on (ASTContext*)0x0000000143c1f600 'scratch ASTContext' in 'Foo' (CXXRecordDecl*)0x000000014378FD80
   FELD Original decl (ASTContext*)0x00000001132c8c00 (Decl*)0x0000000143790c00:
CXXRecordDecl 0x143790c00 <<invalid sloc>> <invalid sloc> struct Foo definition
|-DefinitionData pass_in_registers aggregate standard_layout trivially_copyable pod trivial literal
| |-DefaultConstructor exists trivial needs_implicit
| |-CopyConstructor simple trivial has_const_param needs_implicit implicit_has_const_param
| |-MoveConstructor exists simple trivial needs_implicit
| |-CopyAssignment simple trivial has_const_param needs_implicit implicit_has_const_param
| |-MoveAssignment exists simple trivial needs_implicit
| `-Destructor simple irrelevant trivial needs_implicit
|-FieldDecl 0x143791270 <<invalid sloc>> <invalid sloc> service 'Service *'
`-FieldDecl 0x1437912c8 <<invalid sloc>> <invalid sloc> mach_endpoint 'int'

   FELD Adding [to CXXRecordDecl Foo] lexical FieldDecl FieldDecl 0x143791270 <<invalid sloc>> <invalid sloc> service 'Service *'

   FELD Adding [to CXXRecordDecl Foo] lexical FieldDecl FieldDecl 0x1437912c8 <<invalid sloc>> <invalid sloc> mach_endpoint 'int'

     [ClangASTImporter] Imported (FieldDecl*)0x0000000143790278, named mach_endpoint (from (Decl*)0x00000001437912c8), metadata 280
     [ClangASTImporter] Decl has no origin information in (ASTContext*)0x00000001132c8c00
```
Note how we start "completing" `Foo`. Then emit our new `WARNING`.
Shortly after, we crash, and the log abruptly ends.

rdar://135551810
Michael137 added a commit to swiftlang/llvm-project that referenced this pull request Oct 21, 2024
…orted invariant (llvm#112748)

This patch emits a warning into the expression log when we call
`MapImported` on a decl which has already been imported, but with a new
`to` destination decl. In asserts builds this would lead to triggering
this [ASTImporter::MapImported
assertion](https://github.com/llvm/llvm-project/blob/6d7712a70c163d2ae9e1dc928db31fcb45d9e404/clang/lib/AST/ASTImporter.cpp#L10493-L10494).
In no-asserts builds we will likely crash, in potentially non-obvious
ways. The hope is that the log message will help in diagnosing this type
of issue in the field.

The underlying issue is discussed in more detail in:
llvm#112566.

In a non-asserts build, the last few expression log entries would look
as follows:
```
     CompleteTagDecl on (ASTContext*)scratch ASTContext Completing (TagDecl*)0x00000001132d31d0 named Foo
       CTD Before:
CXXRecordDecl 0x1132d31d0 <<invalid sloc>> <invalid sloc> <undeserialized declarations> struct Foo

 [ClangASTImporter] WARNING: overwriting an already imported decl '0x000000014378fd80' ('Foo') from '0x0000000143790c00' with 0x00000001132d31d0. Likely due to a name conflict when importing 'Foo'.
     [ClangASTImporter] Imported (FieldDecl*)0x0000000143790220, named service (from (Decl*)0x0000000143791270), metadata 271
     [ClangASTImporter] Decl has no origin information in (ASTContext*)0x00000001132c8c00
 FindExternalLexicalDecls on (ASTContext*)0x0000000143c1f600 'scratch ASTContext' in 'Foo' (CXXRecordDecl*)0x000000014378FD80
   FELD Original decl (ASTContext*)0x00000001132c8c00 (Decl*)0x0000000143790c00:
CXXRecordDecl 0x143790c00 <<invalid sloc>> <invalid sloc> struct Foo definition
|-DefinitionData pass_in_registers aggregate standard_layout trivially_copyable pod trivial literal
| |-DefaultConstructor exists trivial needs_implicit
| |-CopyConstructor simple trivial has_const_param needs_implicit implicit_has_const_param
| |-MoveConstructor exists simple trivial needs_implicit
| |-CopyAssignment simple trivial has_const_param needs_implicit implicit_has_const_param
| |-MoveAssignment exists simple trivial needs_implicit
| `-Destructor simple irrelevant trivial needs_implicit
|-FieldDecl 0x143791270 <<invalid sloc>> <invalid sloc> service 'Service *'
`-FieldDecl 0x1437912c8 <<invalid sloc>> <invalid sloc> mach_endpoint 'int'

   FELD Adding [to CXXRecordDecl Foo] lexical FieldDecl FieldDecl 0x143791270 <<invalid sloc>> <invalid sloc> service 'Service *'

   FELD Adding [to CXXRecordDecl Foo] lexical FieldDecl FieldDecl 0x1437912c8 <<invalid sloc>> <invalid sloc> mach_endpoint 'int'

     [ClangASTImporter] Imported (FieldDecl*)0x0000000143790278, named mach_endpoint (from (Decl*)0x00000001437912c8), metadata 280
     [ClangASTImporter] Decl has no origin information in (ASTContext*)0x00000001132c8c00
```
Note how we start "completing" `Foo`. Then emit our new `WARNING`.
Shortly after, we crash, and the log abruptly ends.

rdar://135551810
(cherry picked from commit aa32060)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants