fix(websocket): Fix the bug related to multi-threaded access to trans… (IDFGH-17155) #997
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The
closeoperation on theclient->transportresource is performed under the protection ofclient->lock.esp_websocket_client_send_with_exact_opcodechecks the state without holding the lock when determining whether the transport can be accessed. This can lead to some unexpected behaviors:If
tx_lockis not enabled, callingesp_transport_ws_send_rawwill be abnormally blocked if the transport is already closed.If
tx_lockis enabled,esp_transport_closeandesp_transport_ws_send_rawmay be called concurrently. Sincecloseinvolves releasing some resources, this may lead to unpredictable behavior.Description
To fix this issue, I adjusted the implementation of resource mutual exclusion protection.
1、tx_lock has been removed
The original purpose of introducing
tx_lockwas to allow for parallel processing of receiving and sending, improving transmission efficiency. However, this doesn't necessarily require introducing a newtx_lock. This is because the receiving direction is accessed only by theesp_websocket_client_task. The sending direction may be accessed concurrently by several user tasks and theesp_websocket_client_task. Therefore, theesp_websocket_client_taskdoes not need lock protection when receiving. Lock protection is only applied when theesp_websocket_client_taskneeds to send or when a user task sends, to check if the transport is available.2、Change
esp_websocket_client_abort_connectionto asynchronous close transport.Modify
esp_websocket_client_abort_connectionto only modify the state, preventing user tasks from directly executing close transport and conflicting withesp_websocket_client_task's receive transport.The APIs involved in resource access contention mainly include esp_websocket_client_send_bin, esp_websocket_client_send_bin_partial, esp_websocket_client_send_text, esp_websocket_client_send_text_partial, esp_websocket_client_send_cont_msg, esp_websocket_client_send_fin, esp_websocket_client_send_with_opcode, esp_websocket_client_close, and esp_websocket_client_close_with_code. Ultimately, all of these are involved in esp_websocket_client_send_with_exact_opcode. The main issue is contention involving transport sending, errormsg_buffer, and state when the state is WEBSOCKET_STATE_CONNECTED. After the modification, esp_websocket_client_send_with_exact_opcode accesses and modifies these resources under lock protection. In esp_websocket_client_task, when the state is WEBSOCKET_STATE_CONNECTED, any transport sending, changes to a connectionless state, or access to errormsg_buffer are all performed under lock protection. The close of the transport is uniformly performed by esp_websocket_client_task in a non-WEBSOCKET_STATE_CONNECTED state, ensuring that user task sending will not occur concurrently with the close of the transport, and esp_websocket_client_task does not require lock protection when receiving.
Related
Testing
Checklist
Before submitting a Pull Request, please ensure the following:
Note
Refactors websocket client concurrency and connection lifecycle handling.
tx_lockKconfig/options and all associated separate TX locking; uses a singleclient->lockto guard all transport sends/state/error buffer accessWEBSOCKET_STATE_WAIT_ABORT_CONNECTand changesesp_websocket_client_abort_connection()to be asynchronous (sets state, dispatches disconnect); actualesp_transport_close()now performed by the client taskclient->lockand re-checkstatebefore accessing transport; adds locked aborts on read/write failuresWritten by Cursor Bugbot for commit 3bbe73a. This will update automatically on new commits. Configure here.