-
Notifications
You must be signed in to change notification settings - Fork 15.3k
[WebAssembly] Generate a call to __wasm_apply_global_tls_relocs in __wasm_init_memory #149832
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[WebAssembly] Generate a call to __wasm_apply_global_tls_relocs in __wasm_init_memory #149832
Conversation
|
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
|
@llvm/pr-subscribers-lld @llvm/pr-subscribers-lld-wasm Author: None (Arshia001) ChangesMotivationWe recently implemented the WebAssembly exception handling proposal in Wasmer 6.0. As a result, we can now take advantage of clang's support for compiling SjLj and C++ exceptions to WASM EH. This PR fixes a wasm-ld issue that breaks the use of C++ exception handling in WASI(X) modules. Note: I use WASI(X) to mean either wasi preview 1 or WASIX modules. Error detailsWhen compiling C++ code that uses exceptions, clang generates a TLS initialization happens in two separate places; for the "main thread", As it stands, It is important to note that exception handling code generated by the compiler uses This PR allows a call to But how does emscripten work if this is broken?Good question! Emscripten calls This is a workaround that we can use in WASIX as well. However, as far as I understand, the current behavior is wasm-ld is broken, since Full diff: https://github.com/llvm/llvm-project/pull/149832.diff 1 Files Affected:
diff --git a/lld/wasm/Writer.cpp b/lld/wasm/Writer.cpp
index b704677d36c93..3cd6a73fb1a31 100644
--- a/lld/wasm/Writer.cpp
+++ b/lld/wasm/Writer.cpp
@@ -1366,6 +1366,15 @@ void Writer::createInitMemoryFunction() {
writeUleb128(os, s->index, "segment index immediate");
writeU8(os, 0, "memory index immediate");
}
+
+ // After initializing the TLS segment, we also need to apply TLS
+ // relocations in the same way __wasm_init_tls does.
+ if (ctx.arg.sharedMemory && s->isTLS() &&
+ ctx.sym.applyGlobalTLSRelocs) {
+ writeU8(os, WASM_OPCODE_CALL, "CALL");
+ writeUleb128(os, ctx.sym.applyGlobalTLSRelocs->getFunctionIndex(),
+ "function index");
+ }
}
}
|
sbc100
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the comment for createApplyGlobalTLSRelocationsFunction it cannot be called during the start function: (
llvm-project/lld/wasm/Writer.cpp
Lines 1520 to 1523 in 6fa759b
| // Similar to createApplyGlobalRelocationsFunction but for | |
| // TLS symbols. This cannot be run during the start function | |
| // but must be delayed until __wasm_init_tls is called. | |
| void Writer::createApplyGlobalTLSRelocationsFunction() { |
I don't remember exactly why this is...
lld/wasm/Writer.cpp
Outdated
|
|
||
| // After initializing the TLS segment, we also need to apply TLS | ||
| // relocations in the same way __wasm_init_tls does. | ||
| if (ctx.arg.sharedMemory && s->isTLS() && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ctx.arg.sharedMemory is probably redundant here since without it applyGlobalTLSRelocs would never be created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
|
@sbc100 thanks for the review!
I can't think of anything, except the fact that it needs In the meantime, do you have other suggestions on how to fix this? I suppose making |
|
This is the change that introduced the comment: https://github.com/llvm/llvm-project/blob/ef8c9135efcb3847fc0e5bbdb55eae18751090df/lld/wasm/Writer.cpp Looking over the code, it seems that back then, llvm-project/lld/wasm/Writer.cpp Lines 1116 to 1138 in ef8c913
Around a year later, static allocation of the TLS section was added in: llvm-project/lld/wasm/Writer.cpp Lines 1190 to 1208 in 74f9841
But |
…nsFunction` * Remove redundant condition when generating call to `__wasm_apply_global_tls_relocs` in `lld::wasm::Writer::createInitMemoryFunction`
|
@sbc100 I believe we're at the one-week ping threshold :) |
|
I'm hoping to get some more time to look into this soon. My main concern is around the timing of application of relocations. The dynamic linking scenario its generally not safe to apply relocations until all libraries have been loaded (i.e. all symbols have been resolved). At last that is true for relocations in general. Perhaps its true that TLS relocations always resolve to internal locations? In which case this might be safe. But this is the reason way |
|
The dynamic linking aspect is very important to us as well, since we just so happen to have released a dynamic linker less than a month ago. Now I'm wondering whether our handling of TLS symbols is correct, at least when a symbol with the same name is exported from multiple side modules... Gonna have to look at it tomorrow. Another interesting case is the catching of C++ exceptions across module boundaries. Note, my entire description of the issue was based on a single module with local-exec TLS. I assume the behaviour will be different with global-dynamic. In the meantime, I'll wait for your review with a healthy dose of excitement. |
|
Taking another look at the code it seem like this should actually be safe since |
| writeU8(os, 0, "memory index immediate"); | ||
| } | ||
|
|
||
| // After initializing the TLS segment, we also need to apply TLS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about "After initializing the TLS segment and setting __tls_base we can call __wasm_apply_global_tls_relocs"
|
Note that you will still need to call I suppose we could have I assume you are calling |
Yes, this already happens in both wasix-libc and wasi-libc as part of the thread creation routine.
Edit: NVM, my mistake.
Edit: NVM, my mistake
Yes, at the very last stage when every module is already loaded in and instantiated, since we need everything to be resolved. I did a bit more digging, and the same problem also exists in DL modules; however, there the problem doesn't show itself (at least in anything we've compiled). Breakdown of the situation:
While this problem also exists in DL modules, I will again stress the fact that As long as |
|
Just to clarify did you mean to write |
|
I agree that any relocation functions that is guaranteed to only refer to DSO-local symbols are safe to be called from the start function (once Assuming that However, IIUC there is a difference between |
I did not, in the sense that I didn't know there are two variations. Let me go over the code one more time. |
|
Yes, I did mean |
|
Side note: since I didn't know about |
|
But this creates a second problem:
Edit 2: that doesn't work for side modules though. Better to do it in the linker. |
|
@sbc100 weekly ping! |
|
My understanding of the current situation is:
While its true if Is there some specific configuration where moving the call to |
|
Well, yes, the advantage is not having to initialize the TLS section twice on the main thread.
If this is the case, then |
|
@sbc100 another ping! |
|
@sbc100 one more ping |
|
@sbc100 yet another ping! |
What exactly is happening twice if you call
Does this mean that any program that contains TLS relocations (non-empty |
For example this program: When (compiled as This generates the following |
|
Regarding I now see how |
How would But, more importantly, changing the existing behavior in |
Yes makes sense. However it seems like there are two actual bugs here, neither of which seem to be fixed by this PR, but both of which probably deserve fixing:
I think there are several ways we can solve (1) here in wasm-ld, but we will also need to address (2) downstream. |
This is only a bug if you require that I don't know of any cases where Indeed, this is how it's done today in wasix-libc and our fork of clang, and we have not run into any issues with unintiailized globals. However, you mentioned:
Is there something I'm missing? I don't know of any such relocations. |
I feel like we've been over this already above. What you are saying may be true of
Perhaps none of the programs are you running have any relocation in Perhaps we need to construct construct such as program so show that |
|
I created a simple C program that contains TLS relocations that should fail on a runtime that does not call In this case the relocations reference foo and bar symbols which are defined in a separate DSO: Disassembly the resulting program we see __wasm_init_tls calling two subroutines to apply relocations: __wasm_apply_tls_relocs references the external foo and bar symbols which are defined by imported globals: Here you can see references to the imported GOT.mem.foo and GOT.mem.bar whose values might not be set at instantiation time. |
OK, now I see where the confusion came from.
For worker threads, the flow is mostly the same. However, since worker threads don't have a statically-allocated TLS area from the compiler, we need to allocate that first within the module. We use a small guest-side function to allocate and initialize TLS ( code here ). Everything else remains mostly the same. In this way, the main thread doesn't need to call |
|
I'll try running your sample program and report back the results. However, from just looking at it, I believe we link it correctly. |
|
Well, turns out we're linking your sample module incorrectly. Need to investigate our implementation in depth now: // side.c
int foo = 1;
int bar = 2;// main.c
#include <stdio.h>
extern int foo;
extern int bar;
typedef struct
{
int *a;
int *b;
} my_struct;
_Thread_local my_struct s = {&foo, &bar};
int main()
{
printf("foo: %d %d, bar: %d %d, s: %p\n", *s.a, foo, *s.b, bar, &s);
return 0;
} |
|
One probably the steps you Delibes above is that it depends on IIRC |
I believe we also did a patch that made I just went over your code. The issue is with a bad implementation in the linker; the steps I outlined above are sound (when implemented correctly) AFAICT. I know the flow is rather complex, but it enables the linker implementation to do things exactly as it needs to. For example, in our implementation, we go the emscripten way of exporting all libc functions from the main module, so we want the main to be initialized as much as possible before running ctors in side modules. The current setup lets us do:
A different implementation may want to do things differently, or be smart about which modules are initialized in which order. I also wonder about Anyway, an alternate implementation that sticks to the rule "every thread should call
This is also compatible with what we want to do in the linker. However, this is a breaking change for both the linker and the libc code, so I'd like to avoid it if at all possible. |
|
@sbc100 ping :) |
We already have have the exported I sorry, I don't understand your reasoning for not wanting to explictly call Assuming we do enable the export of the currently-internal |
Motivation
We recently implemented the WebAssembly exception handling proposal in Wasmer 6.0. As a result, we can now take advantage of clang's support for compiling SjLj and C++ exceptions to WASM EH. This PR fixes a wasm-ld issue that breaks the use of C++ exception handling in WASI(X) modules.
Note: I use WASI(X) to mean either wasi preview 1 or WASIX modules.
Error details
When compiling C++ code that uses exceptions, clang generates a
GOT.data.internal.__wasm_lpad_contextglobal, which points to the wasm landing pad context that's shared between compiler code and libunwind. This global is initialized in the__wasm_apply_global_tls_relocsfunction.TLS initialization happens in two separate places; for the "main thread",
__wasm_init_memoryruns as the(start)function of the WASM module, initializing all memory segments (including TLS), while also initializing the main thread's__tls_baseto the space reserved for it by the compiler, and signalling this fact to other threads via an atomic. Other threads need to run__wasm_init_tlsafter getting their respective__tls_baseglobal initialized externally.As it stands,
__wasm_apply_global_tls_relocsis only called through__wasm_init_tls, meaning if code doesn't call__wasm_init_tls, any globals that are initialized in__wasm_apply_global_tls_relocsdo not get initialized. This is the case for the main thread.It is important to note that exception handling code generated by the compiler uses
GOT.data.internal.__wasm_lpad_context, while the code in_Unwind_CallPersonalitygoes through__tls_base + offsetdirectly. BecauseGOT.data.internal.__wasm_lpad_contextis not initialized in the main thread, the compiler and_Unwind_CallPersonalitydo not agree on where the landing pad context is stored. This results inscan_eh_tabnot getting the correct LSDA pointer. Exception handling is then completely broken; the catch-all block runs for every exception due to a lack of any type information at runtime.This PR allows a call to
__wasm_apply_global_tls_relocsto be generated in__wasm_init_memoryif needed, which should fix the value ofGOT.data.internal.__wasm_lpad_contextin modules' main threads. Interestingly, through all of our recent work on dynamic linking and PIC modules, we never encountered__wasm_apply_global_tls_relocs, and I don't know if it's used for anything besidesGOT.data.internal.__wasm_lpad_context.But how does emscripten work if this is broken?
Good question! Emscripten calls
__wasm_init_tlsredundantly for main threads, and thus initializes the TLS area twice. This has no observable effect besides being slower, and does indeed fix C++ exception handling.This is a workaround that we can use in WASIX as well. However, as far as I understand, the current behavior is wasm-ld is broken, since
__wasm_init_memoryand__wasm_init_tlsshould behave similarly with respect to TLS initialization, but feel free to disagree with me here.