Stop abusing shared memory lock to protect exception #2509

yamt · 2023-08-28T09:54:01Z

Use a separate global lock instead. Fixes: bytecodealliance#2407

…rror

wenyongh · 2023-08-29T06:20:18Z

core/iwasm/libraries/thread-mgr/thread_manager.c

+        exception_lock(wasm_inst);
+        if (data->exception != NULL) {
+            snprintf(wasm_inst->cur_exception, sizeof(wasm_inst->cur_exception),
+                     "Exception: %s", data->exception);


The check for Only spread non "wasi proc exit" exception is ignored, see L1255, L1260 of the original file.

it's intentional. is it a problem? do you have a test case?

The check was introduced in PR #1988.

Raising "wasi proc exit" exception is an intentional behavior of runtime, it is not an actual exception but somewhat like setting a flag to let current thread stop running opcodes, and after the thread stops and in the end of wasm_runtime_call_wasm, the thread will clear this exception, so it ends normally without exception thrown. For multi-threading, the thread doesn't spread "wasi proc exit" exception, but just set terminate flags of other threads to let them exit also.

It may cause unexpected behavior if thread A spreads this exception to other threads: other thread (let's say thread B) may stop running opcodes first, then handle the "wasi proc exit" exception and clear exceptions of other threads, including thread A. When thread A's exception is cleared, it may continue to run and throw "unreachable" exception (Note that after calling wasi_proc_exit, in most cases the next opcode is unreachable, the bytecodes are generated by emsdk or wasi-sdk). And eventually "unreachable" exception is thrown.

I believe we found the issue when testing the was-thread related test cases and then we fixed it, the issue occurred occasionally. If we want to reproduce it, we may try running the wasi-thread cases many times.

The check was introduced in PR #1988.

Raising "wasi proc exit" exception is an intentional behavior of runtime, it is not an actual exception but somewhat like setting a flag to let current thread stop running opcodes, and after the thread stops and in the end of wasm_runtime_call_wasm, the thread will clear this exception, so it ends normally without exception thrown. For multi-threading, the thread doesn't spread "wasi proc exit" exception, but just set terminate flags of other threads to let them exit also.

while proc exit is not a real trap, what the runtime should do is almost same as real traps.
ie. terminate all threads and return the exit/trap to the api user as the result of the whole "thread group".

It may cause unexpected behavior if thread A spreads this exception to other threads: other thread (let's say thread B) may stop running opcodes first, then handle the "wasi proc exit" exception and clear exceptions of other threads, including thread A. When thread A's exception is cleared, it may continue to run and throw "unreachable" exception (Note that after calling wasi_proc_exit, in most cases the next opcode is unreachable, the bytecodes are generated by emsdk or wasi-sdk). And eventually "unreachable" exception is thrown.

if it's a problem, real traps have the same problems, don't they?

my impression is that many (all?) of the code clearing other threads' exception are just broken: #2481

I believe we found the issue when testing the was-thread related test cases and then we fixed it, the issue occurred occasionally. If we want to reproduce it, we may try running the wasi-thread cases many times.

i guess i will restore this (IMO wrong) behavior for now because it isn't the main point of this PR.

The check was introduced in PR #1988.
Raising "wasi proc exit" exception is an intentional behavior of runtime, it is not an actual exception but somewhat like setting a flag to let current thread stop running opcodes, and after the thread stops and in the end of wasm_runtime_call_wasm, the thread will clear this exception, so it ends normally without exception thrown. For multi-threading, the thread doesn't spread "wasi proc exit" exception, but just set terminate flags of other threads to let them exit also.

while proc exit is not a real trap, what the runtime should do is almost same as real traps. ie. terminate all threads and return the exit/trap to the api user as the result of the whole "thread group".

Yes, almost the same, except it doesn't spread the exception to other threads and it clears the exception before it ends.

It may cause unexpected behavior if thread A spreads this exception to other threads: other thread (let's say thread B) may stop running opcodes first, then handle the "wasi proc exit" exception and clear exceptions of other threads, including thread A. When thread A's exception is cleared, it may continue to run and throw "unreachable" exception (Note that after calling wasi_proc_exit, in most cases the next opcode is unreachable, the bytecodes are generated by emsdk or wasi-sdk). And eventually "unreachable" exception is thrown.

if it's a problem, real traps have the same problems, don't they?

No, real traps are spread to other threads and terminate flags are also set for other threads, but the trap isn't cleared before the thread ends, so thread A's exception won't be cleared by thread B.

my impression is that many (all?) of the code clearing other threads' exception are just broken: #2481

Do you mean to unify wasm_runtime_set_exception(inst, NULL) and wasm_runtime_clear_exception(inst), and to remove some unneeded exception clear?

I believe we found the issue when testing the was-thread related test cases and then we fixed it, the issue occurred occasionally. If we want to reproduce it, we may try running the wasi-thread cases many times.

i guess i will restore this (IMO wrong) behavior for now because it isn't the main point of this PR.

Yes, had better restore this and fix it with other PR if needed.

i guess i will restore this (IMO wrong) behavior for now because it isn't the main point of this PR.

done

if it's a problem, real traps have the same problems, don't they?

No, real traps are spread to other threads and terminate flags are also set for other threads, but the trap isn't cleared before the thread ends, so thread A's exception won't be cleared by thread B.

a real trap can misbehave in a similar way if the exception is suddenly cleared by the other thread.

my impression is that many (all?) of the code clearing other threads' exception are just broken: #2481

Do you mean to unify wasm_runtime_set_exception(inst, NULL) and wasm_runtime_clear_exception(inst), and to remove some unneeded exception clear?

yes.
unifying two api is just cosmetic.
the other one is a bit cumbersome. i guess we need to investigate one-by-one to see if it should clear other threads' exceptions. (i guess most of them need to clear only the local exception.)

OK, thanks, it really takes effort to investigate them one by one.

wenyongh

LGTM

xujuntwt95329 · 2023-08-31T09:04:33Z

LGTM

…e#2509) Use a separate global lock instead. Fixes: bytecodealliance#2407

Stop abusing shared memory lock to protect exception

3004ee6

Use a separate global lock instead. Fixes: bytecodealliance#2407

yamt force-pushed the exc-lock branch 2 times, most recently from 5c89022 to 73ec790 Compare August 28, 2023 14:11

wasm_cluster_spread_exception: simplify logic and fix minor locking e…

90cc664

…rror

yamt force-pushed the exc-lock branch from 73ec790 to 90cc664 Compare August 29, 2023 03:31

wenyongh reviewed Aug 29, 2023

View reviewed changes

restore "Only spread non "wasi proc exit" exception" behavior for now

f1349d1

wenyongh reviewed Aug 29, 2023

View reviewed changes

wenyongh merged commit 382d52f into bytecodealliance:main Aug 31, 2023

Zzzabiyaka mentioned this pull request Oct 28, 2023

TSAN fast interpreter failure #2680

Closed

vickiegpt pushed a commit to vickiegpt/wamr-aot-gc-checkpoint-restore that referenced this pull request May 27, 2024

Stop abusing shared memory lock to protect exception (bytecodeallianc…

a3c5988

…e#2509) Use a separate global lock instead. Fixes: bytecodealliance#2407

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Stop abusing shared memory lock to protect exception #2509

Stop abusing shared memory lock to protect exception #2509

Uh oh!

yamt commented Aug 28, 2023

Uh oh!

wenyongh Aug 29, 2023

Uh oh!

yamt Aug 29, 2023

Uh oh!

wenyongh Aug 29, 2023

Uh oh!

yamt Aug 29, 2023

Uh oh!

wenyongh Aug 29, 2023

Uh oh!

yamt Aug 29, 2023

Uh oh!

yamt Aug 29, 2023

Uh oh!

wenyongh Aug 29, 2023

Uh oh!

wenyongh left a comment

Uh oh!

xujuntwt95329 commented Aug 31, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Stop abusing shared memory lock to protect exception #2509

Stop abusing shared memory lock to protect exception #2509

Uh oh!

Conversation

yamt commented Aug 28, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wenyongh left a comment

Choose a reason for hiding this comment

Uh oh!

xujuntwt95329 commented Aug 31, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants