Skip to content

Conversation

@yamt
Copy link
Collaborator

@yamt yamt commented Aug 28, 2023

Fixes: #2407

@yamt yamt force-pushed the exc-lock branch 2 times, most recently from 5c89022 to 73ec790 Compare August 28, 2023 14:11
exception_lock(wasm_inst);
if (data->exception != NULL) {
snprintf(wasm_inst->cur_exception, sizeof(wasm_inst->cur_exception),
"Exception: %s", data->exception);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check for Only spread non "wasi proc exit" exception is ignored, see L1255, L1260 of the original file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's intentional. is it a problem? do you have a test case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check was introduced in PR #1988.

Raising "wasi proc exit" exception is an intentional behavior of runtime, it is not an actual exception but somewhat like setting a flag to let current thread stop running opcodes, and after the thread stops and in the end of wasm_runtime_call_wasm, the thread will clear this exception, so it ends normally without exception thrown. For multi-threading, the thread doesn't spread "wasi proc exit" exception, but just set terminate flags of other threads to let them exit also.

It may cause unexpected behavior if thread A spreads this exception to other threads: other thread (let's say thread B) may stop running opcodes first, then handle the "wasi proc exit" exception and clear exceptions of other threads, including thread A. When thread A's exception is cleared, it may continue to run and throw "unreachable" exception (Note that after calling wasi_proc_exit, in most cases the next opcode is unreachable, the bytecodes are generated by emsdk or wasi-sdk). And eventually "unreachable" exception is thrown.

I believe we found the issue when testing the was-thread related test cases and then we fixed it, the issue occurred occasionally. If we want to reproduce it, we may try running the wasi-thread cases many times.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check was introduced in PR #1988.

Raising "wasi proc exit" exception is an intentional behavior of runtime, it is not an actual exception but somewhat like setting a flag to let current thread stop running opcodes, and after the thread stops and in the end of wasm_runtime_call_wasm, the thread will clear this exception, so it ends normally without exception thrown. For multi-threading, the thread doesn't spread "wasi proc exit" exception, but just set terminate flags of other threads to let them exit also.

while proc exit is not a real trap, what the runtime should do is almost same as real traps.
ie. terminate all threads and return the exit/trap to the api user as the result of the whole "thread group".

It may cause unexpected behavior if thread A spreads this exception to other threads: other thread (let's say thread B) may stop running opcodes first, then handle the "wasi proc exit" exception and clear exceptions of other threads, including thread A. When thread A's exception is cleared, it may continue to run and throw "unreachable" exception (Note that after calling wasi_proc_exit, in most cases the next opcode is unreachable, the bytecodes are generated by emsdk or wasi-sdk). And eventually "unreachable" exception is thrown.

if it's a problem, real traps have the same problems, don't they?

my impression is that many (all?) of the code clearing other threads' exception are just broken: #2481

I believe we found the issue when testing the was-thread related test cases and then we fixed it, the issue occurred occasionally. If we want to reproduce it, we may try running the wasi-thread cases many times.

i guess i will restore this (IMO wrong) behavior for now because it isn't the main point of this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check was introduced in PR #1988.
Raising "wasi proc exit" exception is an intentional behavior of runtime, it is not an actual exception but somewhat like setting a flag to let current thread stop running opcodes, and after the thread stops and in the end of wasm_runtime_call_wasm, the thread will clear this exception, so it ends normally without exception thrown. For multi-threading, the thread doesn't spread "wasi proc exit" exception, but just set terminate flags of other threads to let them exit also.

while proc exit is not a real trap, what the runtime should do is almost same as real traps. ie. terminate all threads and return the exit/trap to the api user as the result of the whole "thread group".

Yes, almost the same, except it doesn't spread the exception to other threads and it clears the exception before it ends.

It may cause unexpected behavior if thread A spreads this exception to other threads: other thread (let's say thread B) may stop running opcodes first, then handle the "wasi proc exit" exception and clear exceptions of other threads, including thread A. When thread A's exception is cleared, it may continue to run and throw "unreachable" exception (Note that after calling wasi_proc_exit, in most cases the next opcode is unreachable, the bytecodes are generated by emsdk or wasi-sdk). And eventually "unreachable" exception is thrown.

if it's a problem, real traps have the same problems, don't they?

No, real traps are spread to other threads and terminate flags are also set for other threads, but the trap isn't cleared before the thread ends, so thread A's exception won't be cleared by thread B.

my impression is that many (all?) of the code clearing other threads' exception are just broken: #2481

Do you mean to unify wasm_runtime_set_exception(inst, NULL) and wasm_runtime_clear_exception(inst), and to remove some unneeded exception clear?

I believe we found the issue when testing the was-thread related test cases and then we fixed it, the issue occurred occasionally. If we want to reproduce it, we may try running the wasi-thread cases many times.

i guess i will restore this (IMO wrong) behavior for now because it isn't the main point of this PR.

Yes, had better restore this and fix it with other PR if needed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess i will restore this (IMO wrong) behavior for now because it isn't the main point of this PR.

done

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it's a problem, real traps have the same problems, don't they?

No, real traps are spread to other threads and terminate flags are also set for other threads, but the trap isn't cleared before the thread ends, so thread A's exception won't be cleared by thread B.

a real trap can misbehave in a similar way if the exception is suddenly cleared by the other thread.

my impression is that many (all?) of the code clearing other threads' exception are just broken: #2481

Do you mean to unify wasm_runtime_set_exception(inst, NULL) and wasm_runtime_clear_exception(inst), and to remove some unneeded exception clear?

yes.
unifying two api is just cosmetic.
the other one is a bit cumbersome. i guess we need to investigate one-by-one to see if it should clear other threads' exceptions. (i guess most of them need to clear only the local exception.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks, it really takes effort to investigate them one by one.

Copy link
Contributor

@wenyongh wenyongh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@xujuntwt95329
Copy link
Collaborator

LGTM

@wenyongh wenyongh merged commit 382d52f into bytecodealliance:main Aug 31, 2023
vickiegpt pushed a commit to vickiegpt/wamr-aot-gc-checkpoint-restore that referenced this pull request May 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

broken locking wrt exceptions

3 participants