Skip to content

Conversation

@eloparco
Copy link
Contributor

Without the change in this PR, upon receiving the proc_exit exception propagation while on an atomic.atomic or poll_oneoff, the next instruction is fetched instead of stopping the executing.

That results in the exception being overridden (by an unreachable exception) when using the tests in https://github.com/WebAssembly/wasi-threads/tree/main/test/testsuite, causing the exit code to be wrong.
Continuation of #1929 #1951.

@eloparco eloparco force-pushed the eloparco/fix-exit-code-on-proc-exit branch from e56337b to 58801f9 Compare February 20, 2023 23:41
@eloparco eloparco force-pushed the eloparco/fix-exit-code-on-proc-exit branch 3 times, most recently from e0eddae to 66eeb4e Compare February 21, 2023 13:53
Copy link
Contributor

@wenyongh wenyongh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eloparco
Copy link
Contributor Author

eloparco commented Feb 22, 2023

LGTM

Let me complete it before merging, it's not done yet since I need to fix AOT (and JIT as well probably).

Added a commit for that, but it doesn't work. Any suggestion for debugging AOT in general? Apart from adding some LLVMBuildCall only to pass parameters and print them at runtime.

This part is missing.

@eloparco eloparco force-pushed the eloparco/fix-exit-code-on-proc-exit branch from f61ddc2 to 16c51a6 Compare February 23, 2023 09:20
@eloparco eloparco force-pushed the eloparco/fix-exit-code-on-proc-exit branch from e12147a to 56ee840 Compare February 23, 2023 11:19
else if (exec_env_tls->exce_check_guard_page <= (uint8 *)sig_addr
&& (uint8 *)sig_addr
< exec_env_tls->exce_check_guard_page + page_size) {
bh_assert(wasm_get_exception(module_inst));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had better not remove and modify to:

    bh_assert(wasm_get_exception(module_inst)
#if WASM_ENABLE_THREAD_MGR != 0
              || wasm_cluster_is_thread_terminated(exec_env_tls));
#endif
);

And also change the similar place for Windows (this file, L253):
https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/core/iwasm/common/wasm_runtime_common.c#L253

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I'm debugging to understand why exec_env_tls->suspend_flags is not set properly once I get there.

Copy link
Contributor Author

@eloparco eloparco Feb 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, the problem was that we don't set the suspend flag for the thread that originally raises the exception. I pushed a fix but I noticed that it's breaking the spec tests. Any clue on what's breaking?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that the assert would be executed also on the thread that originally raises the exception and that thread doesn't have the suspend flag set. I tried setting the suspend flag on that initial thread too but it break things since it forces the initial thread to exit prematurely (after a CHECK_SUSPEND_FLAGS();) while we don't want that.

So I was thinking of just having a

#if WASM_ENABLE_THREAD_MGR == 0
    bh_assert(wasm_get_exception(module_inst));
#endif

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how to reproduce the issue? The thread_terminate sample with TEST_TERMINATION_BY_TRAP == 1 and TEST_TERMINATION_IN_MAIN_THREAD == 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may merge this PR firstly if you are OK and we will submit another PR to fix that issue.

Let's do it, so that I can move to #1985 that hopefully should succeed after we merge this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eloparco I tried again, and think this is not a best way to resolve the issue. The root cause is that the main thread's "wasi proc exit" exception is cleared by the sub thread after the latter gets the spread exception from main thread and then clear exception for all threads.
It is strange to spread the "wasi proc exit" exception to other threads and then clear it in other threads. I think we can just ignore spreading of this exception and no need to clear it again. I uploaded PR #1988 to fix the issues, I tested the cases and didn't find the issues you mentioned (unreachable exception was thrown and bh_assert failure), could you help review and try? Thanks.

Copy link
Contributor Author

@eloparco eloparco Feb 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we merge this one and rebase #1988? So that it's easier to review. I'll then try #1988 in a few different cases to see if we don't have regressions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My personal preference is we had better not merge a temporarily PR which doesn't fix the root cause, #1988 should be better and have resolved the issues mentioned in this PR. I tend to merge #1988 directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it makes sense. Let's close this one once #1988 gets merged

@eloparco eloparco force-pushed the eloparco/fix-exit-code-on-proc-exit branch from 165c571 to 636f259 Compare February 23, 2023 16:02
@eloparco eloparco force-pushed the eloparco/fix-exit-code-on-proc-exit branch from 636f259 to 6192719 Compare February 23, 2023 16:51
@eloparco
Copy link
Contributor Author

Closing since the changes here have been applied to #1988

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants