Skip to content

Conversation

@Arshia001
Copy link

Motivation

We recently implemented the WebAssembly exception handling proposal in Wasmer 6.0. As a result, we can now take advantage of clang's support for compiling SjLj and C++ exceptions to WASM EH. This PR fixes a wasm-ld issue that breaks the use of C++ exception handling in WASI(X) modules.

Note: I use WASI(X) to mean either wasi preview 1 or WASIX modules.

Error details

When compiling C++ code that uses exceptions, clang generates a GOT.data.internal.__wasm_lpad_context global, which points to the wasm landing pad context that's shared between compiler code and libunwind. This global is initialized in the __wasm_apply_global_tls_relocs function.

TLS initialization happens in two separate places; for the "main thread", __wasm_init_memory runs as the (start) function of the WASM module, initializing all memory segments (including TLS), while also initializing the main thread's __tls_base to the space reserved for it by the compiler, and signalling this fact to other threads via an atomic. Other threads need to run __wasm_init_tls after getting their respective __tls_base global initialized externally.

As it stands, __wasm_apply_global_tls_relocs is only called through __wasm_init_tls, meaning if code doesn't call __wasm_init_tls, any globals that are initialized in __wasm_apply_global_tls_relocs do not get initialized. This is the case for the main thread.

It is important to note that exception handling code generated by the compiler uses GOT.data.internal.__wasm_lpad_context, while the code in _Unwind_CallPersonality goes through __tls_base + offset directly. Because GOT.data.internal.__wasm_lpad_context is not initialized in the main thread, the compiler and _Unwind_CallPersonality do not agree on where the landing pad context is stored. This results in scan_eh_tab not getting the correct LSDA pointer. Exception handling is then completely broken; the catch-all block runs for every exception due to a lack of any type information at runtime.

This PR allows a call to __wasm_apply_global_tls_relocs to be generated in __wasm_init_memory if needed, which should fix the value of GOT.data.internal.__wasm_lpad_context in modules' main threads. Interestingly, through all of our recent work on dynamic linking and PIC modules, we never encountered __wasm_apply_global_tls_relocs, and I don't know if it's used for anything besides GOT.data.internal.__wasm_lpad_context.

But how does emscripten work if this is broken?

Good question! Emscripten calls __wasm_init_tls redundantly for main threads, and thus initializes the TLS area twice. This has no observable effect besides being slower, and does indeed fix C++ exception handling.

This is a workaround that we can use in WASIX as well. However, as far as I understand, the current behavior is wasm-ld is broken, since __wasm_init_memory and __wasm_init_tls should behave similarly with respect to TLS initialization, but feel free to disagree with me here.

@github-actions
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Member

llvmbot commented Jul 21, 2025

@llvm/pr-subscribers-lld

@llvm/pr-subscribers-lld-wasm

Author: None (Arshia001)

Changes

Motivation

We recently implemented the WebAssembly exception handling proposal in Wasmer 6.0. As a result, we can now take advantage of clang's support for compiling SjLj and C++ exceptions to WASM EH. This PR fixes a wasm-ld issue that breaks the use of C++ exception handling in WASI(X) modules.

Note: I use WASI(X) to mean either wasi preview 1 or WASIX modules.

Error details

When compiling C++ code that uses exceptions, clang generates a GOT.data.internal.__wasm_lpad_context global, which points to the wasm landing pad context that's shared between compiler code and libunwind. This global is initialized in the __wasm_apply_global_tls_relocs function.

TLS initialization happens in two separate places; for the "main thread", __wasm_init_memory runs as the (start) function of the WASM module, initializing all memory segments (including TLS), while also initializing the main thread's __tls_base to the space reserved for it by the compiler, and signalling this fact to other threads via an atomic. Other threads need to run __wasm_init_tls after getting their respective __tls_base global initialized externally.

As it stands, __wasm_apply_global_tls_relocs is only called through __wasm_init_tls, meaning if code doesn't call __wasm_init_tls, any globals that are initialized in __wasm_apply_global_tls_relocs do not get initialized. This is the case for the main thread.

It is important to note that exception handling code generated by the compiler uses GOT.data.internal.__wasm_lpad_context, while the code in _Unwind_CallPersonality goes through __tls_base + offset directly. Because GOT.data.internal.__wasm_lpad_context is not initialized in the main thread, the compiler and _Unwind_CallPersonality do not agree on where the landing pad context is stored. This results in scan_eh_tab not getting the correct LSDA pointer. Exception handling is then completely broken; the catch-all block runs for every exception due to a lack of any type information at runtime.

This PR allows a call to __wasm_apply_global_tls_relocs to be generated in __wasm_init_memory if needed, which should fix the value of GOT.data.internal.__wasm_lpad_context in modules' main threads. Interestingly, through all of our recent work on dynamic linking and PIC modules, we never encountered __wasm_apply_global_tls_relocs, and I don't know if it's used for anything besides GOT.data.internal.__wasm_lpad_context.

But how does emscripten work if this is broken?

Good question! Emscripten calls __wasm_init_tls redundantly for main threads, and thus initializes the TLS area twice. This has no observable effect besides being slower, and does indeed fix C++ exception handling.

This is a workaround that we can use in WASIX as well. However, as far as I understand, the current behavior is wasm-ld is broken, since __wasm_init_memory and __wasm_init_tls should behave similarly with respect to TLS initialization, but feel free to disagree with me here.


Full diff: https://github.com/llvm/llvm-project/pull/149832.diff

1 Files Affected:

  • (modified) lld/wasm/Writer.cpp (+9)
diff --git a/lld/wasm/Writer.cpp b/lld/wasm/Writer.cpp
index b704677d36c93..3cd6a73fb1a31 100644
--- a/lld/wasm/Writer.cpp
+++ b/lld/wasm/Writer.cpp
@@ -1366,6 +1366,15 @@ void Writer::createInitMemoryFunction() {
           writeUleb128(os, s->index, "segment index immediate");
           writeU8(os, 0, "memory index immediate");
         }
+
+        // After initializing the TLS segment, we also need to apply TLS
+        // relocations in the same way __wasm_init_tls does.
+        if (ctx.arg.sharedMemory && s->isTLS() &&
+            ctx.sym.applyGlobalTLSRelocs) {
+          writeU8(os, WASM_OPCODE_CALL, "CALL");
+          writeUleb128(os, ctx.sym.applyGlobalTLSRelocs->getFunctionIndex(),
+                      "function index");
+        }
       }
     }
 

Copy link
Collaborator

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the comment for createApplyGlobalTLSRelocationsFunction it cannot be called during the start function: (

// Similar to createApplyGlobalRelocationsFunction but for
// TLS symbols. This cannot be run during the start function
// but must be delayed until __wasm_init_tls is called.
void Writer::createApplyGlobalTLSRelocationsFunction() {
.

I don't remember exactly why this is...


// After initializing the TLS segment, we also need to apply TLS
// relocations in the same way __wasm_init_tls does.
if (ctx.arg.sharedMemory && s->isTLS() &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ctx.arg.sharedMemory is probably redundant here since without it applyGlobalTLSRelocs would never be created.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@Arshia001
Copy link
Author

Arshia001 commented Jul 22, 2025

@sbc100 thanks for the review!

it cannot be called during the start function:

I can't think of anything, except the fact that it needs __tls_base to be initialized before it can work. Maybe, at some point, __tls_base wasn't initialized in __wasm_init_memory, and nobody updated that comment after this behavior changed? I'll dig into the commit history to see if I can figure this out.

In the meantime, do you have other suggestions on how to fix this? I suppose making __wasm_apply_global_tls_relocs public would at least allow it to be called from the WASIX module's _start function.

@Arshia001
Copy link
Author

This is the change that introduced the comment: https://github.com/llvm/llvm-project/blob/ef8c9135efcb3847fc0e5bbdb55eae18751090df/lld/wasm/Writer.cpp

Looking over the code, it seems that back then, __tls_base wasn't being initialized during __wasm_memory_init:

writeU8(os, WASM_OPCODE_END, "end $init");
for (const OutputSegment *s : segments) {
if (needsPassiveInitialization(s)) {
// destination address
writePtrConst(os, s->startVA, is64, "destination address");
if (config->isPic) {
writeU8(os, WASM_OPCODE_GLOBAL_GET, "GLOBAL_GET");
writeUleb128(os, WasmSym::memoryBase->getGlobalIndex(),
"memory_base");
writeU8(os, is64 ? WASM_OPCODE_I64_ADD : WASM_OPCODE_I32_ADD,
"i32.add");
}
// source segment offset
writeI32Const(os, 0, "segment offset");
// memory region size
writeI32Const(os, s->size, "memory region size");
// memory.init instruction
writeU8(os, WASM_OPCODE_MISC_PREFIX, "bulk-memory prefix");
writeUleb128(os, WASM_OPCODE_MEMORY_INIT, "memory.init");
writeUleb128(os, s->index, "segment index immediate");
writeU8(os, 0, "memory index immediate");
}
}

Around a year later, static allocation of the TLS section was added in:

// When we initialize the TLS segment we also set the `__tls_base`
// global. This allows the runtime to use this static copy of the
// TLS data for the first/main thread.
if (config->sharedMemory && s->isTLS()) {
if (config->isPic) {
// Cache the result of the addionion in local 0
writeU8(os, WASM_OPCODE_LOCAL_TEE, "local.tee");
writeUleb128(os, 1, "local 1");
} else {
writePtrConst(os, s->startVA, is64, "destination address");
}
writeU8(os, WASM_OPCODE_GLOBAL_SET, "GLOBAL_SET");
writeUleb128(os, WasmSym::tlsBase->getGlobalIndex(),
"__tls_base");
if (config->isPic) {
writeU8(os, WASM_OPCODE_LOCAL_GET, "local.tee");
writeUleb128(os, 1, "local 1");
}
}

But __wasm_apply_global_tls_relocs probably flew under the radar and the comment was never removed. I assume the correct thing to do here would be to remove that comment as well. What do you think, @sbc100?

…nsFunction`

* Remove redundant condition when generating call to `__wasm_apply_global_tls_relocs` in `lld::wasm::Writer::createInitMemoryFunction`
@Arshia001
Copy link
Author

@sbc100 I believe we're at the one-week ping threshold :)

@sbc100
Copy link
Collaborator

sbc100 commented Jul 28, 2025

I'm hoping to get some more time to look into this soon.

My main concern is around the timing of application of relocations. The dynamic linking scenario its generally not safe to apply relocations until all libraries have been loaded (i.e. all symbols have been resolved). At last that is true for relocations in general. Perhaps its true that TLS relocations always resolve to internal locations? In which case this might be safe.

But this is the reason way __wasm_apply_data_relocs is never called from the wasm start function: Symbol resolution is not necessarily complete when the start functions runs (i.e. when a given module is loaded).

@Arshia001
Copy link
Author

Arshia001 commented Jul 28, 2025

The dynamic linking aspect is very important to us as well, since we just so happen to have released a dynamic linker less than a month ago. Now I'm wondering whether our handling of TLS symbols is correct, at least when a symbol with the same name is exported from multiple side modules... Gonna have to look at it tomorrow.

Another interesting case is the catching of C++ exceptions across module boundaries. Note, my entire description of the issue was based on a single module with local-exec TLS. I assume the behaviour will be different with global-dynamic.

In the meantime, I'll wait for your review with a healthy dose of excitement.

@sbc100
Copy link
Collaborator

sbc100 commented Aug 1, 2025

Taking another look at the code it seem like this should actually be safe since __wasm_apply_global_tls_relocs only contains relocations for internalGotSymbols which are symbols that resolve to DSO-local addresses.

writeU8(os, 0, "memory index immediate");
}

// After initializing the TLS segment, we also need to apply TLS
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about "After initializing the TLS segment and setting __tls_base we can call __wasm_apply_global_tls_relocs"

@sbc100
Copy link
Collaborator

sbc100 commented Aug 1, 2025

Note that you will still need to call __wasm_init_tls on all threads, including the main thread because this function also calls __wasm_apply_tls_relocs. This would need to be called on all threads including the main thread, but it cannot be part of the start function and (like __wasm_apply_data_relocs can only be called once all symbols are resolved).

I suppose we could have __wasm_apply_data_relocs call __wasm_apply_tls_relocs on the assumption that __tls_base has been set by then... but that would require a bunch of refactoring. I'm not sure its worth it.

I assume you are calling __wasm_apply_data_relocs somewhere in your dynamic linker?

@Arshia001
Copy link
Author

Arshia001 commented Aug 1, 2025

you will still need to call __wasm_init_tls on all threads

Yes, this already happens in both wasix-libc and wasi-libc as part of the thread creation routine.

including the main thread because this function also calls __wasm_apply_tls_relocs

Not quite sure why this would be the case, considering the statically-allocated TLS section exists and is initialized by __wasm_init_memory? Neither of the wasi(x)-libc's make this call today. Note that __wasm_init_memory initializes __tls_base for the main thread as well, so the call to __wasm_apply_tls_relocs is safe in that regard.

Edit: NVM, my mistake.

but it cannot be part of the start function and (like __wasm_apply_data_relocs can only be called once all symbols are resolved).

If there are external symbols, then yes. However, I have only ever seen (and you pointed this out as well) that __wasm_apply_tls_relocs only initializes DSO-local symbols, which should be safe to call within the start function. Unless I'm missing something here?

Edit: NVM, my mistake

I assume you are calling __wasm_apply_data_relocs somewhere in your dynamic linker?

Yes, at the very last stage when every module is already loaded in and instantiated, since we need everything to be resolved.

I did a bit more digging, and the same problem also exists in DL modules; however, there the problem doesn't show itself (at least in anything we've compiled). Breakdown of the situation:

  • Non-DL module: the pre-allocated TLS area for the main thread lives at offset 1024. Hence, the global needs to be relocated to account for a __tls_base of 1024. We get an error.
  • DL module, running on the Wasmer linker: the dynamic linker puts __memory_base for the main module at offset 0, and our compile settings put the TSD area before the stack, so __memory_base = __tls_base = 0 for the main module, on the main thread. Hence, the initial value for the $GOT.data.internal TLS symbols are correct by chance, and we don't see any issues.

While this problem also exists in DL modules, I will again stress the fact that __wasm_apply_global_tls_relocs is also generated for non-DL modules, where no dynamic linker comes into play at any point, so there's no "link finalization phase" from which to call __wasm_apply_global_tls_relocs.

As long as __wasm_apply_global_tls_relocs only relocates DSO-local symbols, I don't see why it should be unsafe to be called from the start function.

@sbc100
Copy link
Collaborator

sbc100 commented Aug 5, 2025

Just to clarify did you mean to write __wasm_apply_global_tls_relocs rather than __wasm_apply_tls_relocs in that last comment? (if so maybe update it and I'll delete this comment).

@sbc100
Copy link
Collaborator

sbc100 commented Aug 5, 2025

I agree that any relocation functions that is guaranteed to only refer to DSO-local symbols are safe to be called from the start function (once __tls_base is set).

Assuming that __wasm_apply_global_tls_relocs only refers to DSO-local symbols then this change would be safe. From my reading of the code I believe that you could be correct about that.

However, IIUC there is a difference between __wasm_apply_tls_relocs and __wasm_apply_global_tls_relocs in that regard. It seems likely that the later will only ever refer to DSO-local symbols, but not the former could contain references to any symbol at all.

@Arshia001
Copy link
Author

Just to clarify did you mean to write __wasm_apply_global_tls_relocs rather than __wasm_apply_tls_relocs

I did not, in the sense that I didn't know there are two variations. Let me go over the code one more time.

@Arshia001
Copy link
Author

Yes, I did mean __wasm_apply_global_tls_relocs. I'll edit my previous comment. I didn't know of __wasm_apply_tls_relocs, so I made little sense there.

@Arshia001
Copy link
Author

Side note: since I didn't know about __wasm_apply_tls_relocs, we weren't accounting for it in our linker implementation, so thanks for pointing that out!

@Arshia001
Copy link
Author

Arshia001 commented Aug 5, 2025

But this creates a second problem: __wasm_apply_tls_relocs is a hidden symbol, and can't be exported via --export AFAIK. I assume the fix there would be to make the symbol WASM_SYMBOL_VISIBILITY_DEFAULT | WASM_SYMBOL_EXPORTED, so it can be exported and called by linkers? That's working for me locally, so I can push it if you think it's correct in principle.


Edit: __wasm_apply_tls_relocs can also be called during startup from wasix-libc. But that also requires the same visibility change to the symbol.

Edit 2: that doesn't work for side modules though. Better to do it in the linker.

@Arshia001
Copy link
Author

@sbc100 weekly ping!

@sbc100
Copy link
Collaborator

sbc100 commented Aug 14, 2025

My understanding of the current situation is:

  1. Every thread (including the main thread) needs to call __wasm_init_tls before it can run
  2. Given that, it seems like both relocation functions (__wasm_apply_global_tls_relocs and __wasm_apply_tls_relocs) might as well both get called from there.

While its true if __wasm_apply_global_tls_relocs is only referencing DSO-local symbols then it could happen during __wasm_init_memory on the main thread I'm not sure I see any advantage.

Is there some specific configuration where moving the call to __wasm_apply_global_tls_relocs is advantageous for you?

@Arshia001
Copy link
Author

Arshia001 commented Aug 14, 2025

Well, yes, the advantage is not having to initialize the TLS section twice on the main thread.

Every thread (including the main thread) needs to call __wasm_init_tls before it can run

If this is the case, then __wasm_init_memory shouldn't initialize the TLS section. As it stands, __wasm_init_memory initalizes TLS, so a call to __wasm_init_tls would lead to initializing TLS twice. As I mentioned earlier, neither wasix-libc nor (afaik, but I'm pretty certain) wasi-libc call __wasm_init_tls on the main thread.

@Arshia001
Copy link
Author

@sbc100 another ping!

@Arshia001
Copy link
Author

@sbc100 one more ping

@Arshia001
Copy link
Author

@sbc100 yet another ping!

@sbc100
Copy link
Collaborator

sbc100 commented Oct 8, 2025

If this is the case, then __wasm_init_memory shouldn't initialize the TLS section. As it stands, __wasm_init_memory initalizes TLS, so a call to __wasm_init_tls would lead to initializing TLS twice.

What exactly is happening twice if you call __wasm_init_tls on the main thread today? Do you mean the memory.init call? Or some kind of relocation application?

As I mentioned earlier, neither wasix-libc nor (afaik, but I'm pretty certain) wasi-libc call __wasm_init_tls on the main thread.

Does this mean that any program that contains TLS relocations (non-empty __wasm_apply_tls_relocs function) would not work? Since the main thread TLS data segement would not have these relocations applied?

@sbc100
Copy link
Collaborator

sbc100 commented Oct 8, 2025

If this is the case, then __wasm_init_memory shouldn't initialize the TLS section. As it stands, __wasm_init_memory initalizes TLS, so a call to __wasm_init_tls would lead to initializing TLS twice.

What exactly is happening twice if you call __wasm_init_tls on the main thread today? Do you mean the memory.init call? Or some kind of relocation application?

As I mentioned earlier, neither wasix-libc nor (afaik, but I'm pretty certain) wasi-libc call __wasm_init_tls on the main thread.

Does this mean that any program that contains TLS relocations (non-empty __wasm_apply_tls_relocs function) would not work? Since the main thread TLS data segement would not have these relocations applied?

For example this program:

#include <stdio.h>

int sym1 = 42;
int sym2 = 43;

_Thread_local int* tls_data[] = { &sym1, &sym2 };

int main() {
  printf("in mian: %p %p\n", tls_data[0], tls_data[1]);
}

When (compiled as -pie program) will need two relocations in the TLS segment.

$ emcc -sMAIN_MODULE=2 --profiling-funcs test.c -pthread 

This generates the following __wasm_apply_tls_relocs function:

000b81 func[35] <__wasm_apply_tls_relocs>:
 000b82: 41 00                      | i32.const 0
 000b84: 23 07                      | global.get 7 <__tls_base>
 000b86: 6a                         | i32.add
 000b87: 23 01                      | global.get 1 <__memory_base>
 000b89: 41 e8 25                   | i32.const 4840
 000b8c: 6a                         | i32.add
 000b8d: 36 02 00                   | i32.store 2 0
 000b90: 41 04                      | i32.const 4
 000b92: 23 07                      | global.get 7 <__tls_base>
 000b94: 6a                         | i32.add
 000b95: 23 01                      | global.get 1 <__memory_base>
 000b97: 41 ec 25                   | i32.const 4844
 000b9a: 6a                         | i32.add
 000b9b: 36 02 00                   | i32.store 2 0
 000b9e: 41 10                      | i32.const 16
 000ba0: 23 07                      | global.get 7 <__tls_base>
 000ba2: 6a                         | i32.add
 000ba3: 23 01                      | global.get 1 <__memory_base>
 000ba5: 41 c4 27                   | i32.const 5060
 000ba8: 6a                         | i32.add
 000ba9: 36 02 00                   | i32.store 2 0
 000bac: 0b                         | end

@sbc100
Copy link
Collaborator

sbc100 commented Oct 8, 2025

Regarding then __wasm_init_memory shouldn't initialize the TLS section I think you probably correct yes.

I now see how __wasm_init_tls and __wasm_init_memory are both doing this, but we should probably have only __wasm_init_tls do it.

@Arshia001
Copy link
Author

Arshia001 commented Oct 15, 2025

but we should probably have only __wasm_init_tls do it.

How would __tls_base for the main thread be initialized? I understand __wasm_init_memory initializes it to the statically-allocated TLS area, but __wasm_init_tls expects it to be initialized already before it's called.

But, more importantly, changing the existing behavior in __wasm_init_memory would be a breaking change for wasix-libc and wasi-libc. One can do lots of tricks in wasix-libc to support all compiler versions, but if you have a (outdated) version of wasix-libc installed already and decide to update your clang installation, you'll suddenly start getting broken modules. I'd rather avoid that scenario if possible, since it's quite difficult to debug for users.

@sbc100
Copy link
Collaborator

sbc100 commented Oct 16, 2025

but we should probably have only __wasm_init_tls do it.

How would __tls_base for the main thread be initialized? I understand __wasm_init_memory initializes it to the statically-allocated TLS area, but __wasm_init_tls expects it to be initialized already before it's called.
But, more importantly, changing the existing behavior in __wasm_init_memory would be a breaking change for wasix-libc and wasi-libc. One can do lots of tricks in wasix-libc to support all compiler versions, but if you have a (outdated) version of wasix-libc installed already and decide to update your clang installation, you'll suddenly start getting broken modules. I'd rather avoid that scenario if possible, since it's quite difficult to debug for users.

Yes makes sense.

However it seems like there are two actual bugs here, neither of which seem to be fixed by this PR, but both of which probably deserve fixing:

  1. Bug in lld: memory.init is called twice for the TLS region of the main thread.
  2. Bug in wasi-libc: __wasm_init_tls is not called on the main thread, even though it should be. This PR doesn't fix this in the general case because __wasm_init_tls does more then just just call __wasm_apply_global_tls_relocs, and in particular applies relocations which must happen after the start function.

I think there are several ways we can solve (1) here in wasm-ld, but we will also need to address (2) downstream.

@Arshia001
Copy link
Author

Bug in lld: memory.init is called twice for the TLS region of the main thread.

This is only a bug if you require that __wasm_tls_init be called on the main thread; does that have to be the case?

I don't know of any cases where __wasm_apply_global_tls_relocs accesses globals pointing to non-DSO-local variables. If it is the case that only globals corresponding to DSO-local variables are accessed/initialized, then all that's needed for __wasm_apply_global_tls_relocs to work is __tls_base, which can be initialized before instantiation. As I understand, relocations involving symbols from other DL modules are performed entirely by the linker.

Indeed, this is how it's done today in wasix-libc and our fork of clang, and we have not run into any issues with unintiailized globals. However, you mentioned:

in particular applies relocations which must happen after the start function.

Is there something I'm missing? I don't know of any such relocations.

@sbc100
Copy link
Collaborator

sbc100 commented Oct 21, 2025

Bug in lld: memory.init is called twice for the TLS region of the main thread.

This is only a bug if you require that __wasm_tls_init be called on the main thread; does that have to be the case?

I don't know of any cases where __wasm_apply_global_tls_relocs accesses globals pointing to non-DSO-local variables. If it is the case that only globals corresponding to DSO-local variables are accessed/initialized, then all that's needed for __wasm_apply_global_tls_relocs to work is __tls_base, which can be initialized before instantiation. As I understand, relocations involving symbols from other DL modules are performed entirely by the linker.

I feel like we've been over this already above.

What you are saying may be true of __wasm_apply_global_tls_relocs but it is not true of __wasm_apply_tls_relocs. __wasm_tls_init needs to be called after instantiation because of this latter function, even if technically the former could run during instantiation.

Indeed, this is how it's done today in wasix-libc and our fork of clang, and we have not run into any issues with unintiailized globals. However, you mentioned:

in particular applies relocations which must happen after the start function.

Is there something I'm missing? I don't know of any such relocations.

Perhaps none of the programs are you running have any relocation in __wasm_apply_tls_relocs?

Perhaps we need to construct construct such as program so show that __wasm_tls_init does need to be called after initialization on the main thread? Presumably any such program would fail on your runtime today?

@sbc100
Copy link
Collaborator

sbc100 commented Oct 21, 2025

I created a simple C program that contains TLS relocations that should fail on a runtime that does not call __wasm_tls_init on the main thread:

In this case the relocations reference foo and bar symbols which are defined in a separate DSO:

extern int foo;
extern int bar;

typedef struct {
  int* a;
  int* b;
} my_struct;

_Thread_local my_struct s = { &foo, &bar };

int main() {
  return (int)&s;
}

Disassembly the resulting program we see __wasm_init_tls calling two subroutines to apply relocations:

000a0f func[31] <__wasm_init_tls>:
 000a10: 20 00                      | local.get 0
 000a12: 24 09                      | global.set 9 <__tls_base>
 000a14: 20 00                      | local.get 0
 000a16: 41 00                      | i32.const 0
 000a18: 41 20                      | i32.const 32
 000a1a: fc 08 00 00                | memory.init 0 0 <.tdata>
 000a1e: 10 23                      | call 35 <__wasm_apply_tls_relocs>
 000a20: 10 22                      | call 34 <__wasm_apply_global_tls_relocs>
 000a22: 0b 

__wasm_apply_tls_relocs references the external foo and bar symbols which are defined by imported globals:

000ba0 func[35] <__wasm_apply_tls_relocs>:
 000ba1: 41 00                      | i32.const 0
 000ba3: 23 09                      | global.get 9 <__tls_base>
 000ba5: 6a                         | i32.add
 000ba6: 23 03                      | global.get 3 <foo>
 000ba8: 36 02 00                   | i32.store 2 0
 000bab: 41 04                      | i32.const 4
 000bad: 23 09                      | global.get 9 <__tls_base>
 000baf: 6a                         | i32.add
 000bb0: 23 04                      | global.get 4 <bar>
 000bb2: 36 02 00                   | i32.store 2 0
 000bb5: 41 10                      | i32.const 16
 000bb7: 23 09                      | global.get 9 <__tls_base>
 000bb9: 6a                         | i32.add
 000bba: 23 01                      | global.get 1 <__memory_base>
 000bbc: 41 98 26                   | i32.const 4888
 000bbf: 6a                         | i32.add
 000bc0: 36 02 00                   | i32.store 2 0
 000bc3: 0b                         | end

Here you can see references to the imported GOT.mem.foo and GOT.mem.bar whose values might not be set at instantiation time.

@Arshia001
Copy link
Author

What you are saying may be true of __wasm_apply_global_tls_relocs but it is not true of __wasm_apply_tls_relocs. __wasm_tls_init needs to be called after instantiation because of this latter function, even if technically the former could run during instantiation.

OK, now I see where the confusion came from. __wasm_apply_tls_relocs does need to be called after linking steps are performed, but in our linker implementation, the flow for the main thread is:

  1. Allocate memory and table space for each module. This gives us a __memory_base and a __tls_base.
  2. Construct the imports object. At this stage, we may not have access to every import from every other module anyway; some modules won't be instantiated at all since circular dependencies between modules are possible. What we do is, for GOT.mem and GOT.func imports, we just give the global a 0 value and keep track of them, whereas for function imports in env, we generate stubs that will resolve the correct function and call it later.
  3. Instantiate each module; this runs the start function and, as a consequence, calls __wasm_init_memory which in turn calls __wasm_apply_global_tls_relocs.
  4. After every module is instantiated, we go over all modules again, setting the pending globals from step 2 to the correct value. See code here: https://github.com/wasmerio/wasmer/blob/09ce060643a7761d3532d5c1f4c0aabc30855cc6/lib/wasix/src/state/linker.rs#L1657-L1700
  5. We then call all the functions that couldn't be called before everything else was initialized; these are __wasm_apply_data_relocs and __wasm_apply_tls_relocs.
  6. Once every module has the correct linking steps performed, the final step is to call __wasm_call_ctors; this runs arbitrary user code, and there are no guarantees about what it may or may not do.
  • One problem we still have (and I believe cannot be solved) is, if two modules interdepend on each other's ctors having run before their own, one module will not work correctly.

For worker threads, the flow is mostly the same. However, since worker threads don't have a statically-allocated TLS area from the compiler, we need to allocate that first within the module. We use a small guest-side function to allocate and initialize TLS ( code here ). Everything else remains mostly the same.

In this way, the main thread doesn't need to call __wasm_init_tls anyway. I do agree this is not the cleanest thing ever, but it's sound AFAICT.

@Arshia001
Copy link
Author

I'll try running your sample program and report back the results. However, from just looking at it, I believe we link it correctly.

@Arshia001
Copy link
Author

Well, turns out we're linking your sample module incorrectly. Need to investigate our implementation in depth now:

// side.c
int foo = 1;
int bar = 2;
// main.c
#include <stdio.h>

extern int foo;
extern int bar;

typedef struct
{
    int *a;
    int *b;
} my_struct;

_Thread_local my_struct s = {&foo, &bar};

int main()
{
    printf("foo: %d %d, bar: %d %d, s: %p\n", *s.a, foo, *s.b, bar, &s);
    return 0;
}
foo: 0 1, bar: 0 2, s: 0x240

@sbc100
Copy link
Collaborator

sbc100 commented Oct 22, 2025

One probably the steps you Delibes above is that it depends on __wasm_apply_tls_relocs being exported, so that the host can call it. I though you mentioned about that this does not work?

IIRC __wasm_init_tls was disigned to be the run-called symbols and __wasm_apply_tls_relocs was designed to be an internal sub-routine that was only called by __wasm_init_tls

@Arshia001
Copy link
Author

depends on __wasm_apply_tls_relocs being exported

I believe we also did a patch that made __wasm_apply_tls_relocs public so it can be exported.

I just went over your code. The issue is with a bad implementation in the linker; the steps I outlined above are sound (when implemented correctly) AFAICT. I know the flow is rather complex, but it enables the linker implementation to do things exactly as it needs to.

For example, in our implementation, we go the emscripten way of exporting all libc functions from the main module, so we want the main to be initialized as much as possible before running ctors in side modules. The current setup lets us do:

  • call relocation functions on the main
  • call relocation functions on the sides
  • call ctors on the sides
  • call the _start function in the main - at this point we can't make any more calls to the sides, so they have to be set up already

A different implementation may want to do things differently, or be smart about which modules are initialized in which order.

I also wonder about __wasm_apply_tls_relocs being private... __wasm_apply_data_relocs is already public and I understand linkers are supposed to call it. What's special about __wasm_apply_tls_relocs that it gets a different treatment?

Anyway, an alternate implementation that sticks to the rule "every thread should call __wasm_tls_init" would be:

  • __wasm_init_memory sets __tls_base to the area allocated by the compiler, but does not initialize it
  • __wasm_init_tls can later take the __tls_base value and do its initialization, including calling __wasm_apply_global_tls_relocs

This is also compatible with what we want to do in the linker. However, this is a breaking change for both the linker and the libc code, so I'd like to avoid it if at all possible.

@Arshia001
Copy link
Author

@sbc100 ping :)

@sbc100
Copy link
Collaborator

sbc100 commented Nov 18, 2025

I also wonder about __wasm_apply_tls_relocs being private... __wasm_apply_data_relocs is already public and I understand linkers are supposed to call it. What's special about __wasm_apply_tls_relocs that it gets a different treatment?

We already have have the exported __wasm_init_tls function which takes care of calling __wasm_apply_data_relocs. I don't see why we would want to complicate things by also exporting __wasm_apply_data_relocs separately.

I sorry, I don't understand your reasoning for not wanting to explictly call __wasm_init_tls on the main thread. Why is the downside to doing that?

Assuming we do enable the export of the currently-internal __wasm_apply_data_relocs, wouldn't you need to call that function in the exact same place that I claim you should be calling __wasm_init_tls? i.e. on every thread, including the main thread after all modules have been loaded and all symbols resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants