Skip to content

Conversation

@udoudou
Copy link

@udoudou udoudou commented Jan 27, 2026

The close operation on the client->transport resource is performed under the protection of client->lock. esp_websocket_client_send_with_exact_opcode checks the state without holding the lock when determining whether the transport can be accessed. This can lead to some unexpected behaviors:

  1. If tx_lock is not enabled, calling esp_transport_ws_send_raw will be abnormally blocked if the transport is already closed.

  2. If tx_lock is enabled, esp_transport_close and esp_transport_ws_send_raw may be called concurrently. Since close involves releasing some resources, this may lead to unpredictable behavior.

Description

To fix this issue, I adjusted the implementation of resource mutual exclusion protection.
1、tx_lock has been removed
The original purpose of introducing tx_lock was to allow for parallel processing of receiving and sending, improving transmission efficiency. However, this doesn't necessarily require introducing a new tx_lock. This is because the receiving direction is accessed only by the esp_websocket_client_task. The sending direction may be accessed concurrently by several user tasks and the esp_websocket_client_task. Therefore, the esp_websocket_client_task does not need lock protection when receiving. Lock protection is only applied when the esp_websocket_client_task needs to send or when a user task sends, to check if the transport is available.

2、Change esp_websocket_client_abort_connection to asynchronous close transport.
Modify esp_websocket_client_abort_connection to only modify the state, preventing user tasks from directly executing close transport and conflicting with esp_websocket_client_task's receive transport.

The APIs involved in resource access contention mainly include esp_websocket_client_send_bin, esp_websocket_client_send_bin_partial, esp_websocket_client_send_text, esp_websocket_client_send_text_partial, esp_websocket_client_send_cont_msg, esp_websocket_client_send_fin, esp_websocket_client_send_with_opcode, esp_websocket_client_close, and esp_websocket_client_close_with_code. Ultimately, all of these are involved in esp_websocket_client_send_with_exact_opcode. The main issue is contention involving transport sending, errormsg_buffer, and state when the state is WEBSOCKET_STATE_CONNECTED. After the modification, esp_websocket_client_send_with_exact_opcode accesses and modifies these resources under lock protection. In esp_websocket_client_task, when the state is WEBSOCKET_STATE_CONNECTED, any transport sending, changes to a connectionless state, or access to errormsg_buffer are all performed under lock protection. The close of the transport is uniformly performed by esp_websocket_client_task in a non-WEBSOCKET_STATE_CONNECTED state, ensuring that user task sending will not occur concurrently with the close of the transport, and esp_websocket_client_task does not require lock protection when receiving.

Related

Testing


Checklist

Before submitting a Pull Request, please ensure the following:

  • 🚨 This PR does not introduce breaking changes.
  • All CI checks (GH Actions) pass.
  • Documentation is updated as needed.
  • Tests are updated or added as necessary.
  • Code is well-commented, especially in complex areas.
  • Git history is clean — commits are squashed to the minimum necessary.

Note

Refactors websocket client concurrency and connection lifecycle handling.

  • Removes tx_lock Kconfig/options and all associated separate TX locking; uses a single client->lock to guard all transport sends/state/error buffer access
  • Introduces WEBSOCKET_STATE_WAIT_ABORT_CONNECT and changes esp_websocket_client_abort_connection() to be asynchronous (sets state, dispatches disconnect); actual esp_transport_close() now performed by the client task
  • Ensures PING/PONG/CLOSE and send paths acquire client->lock and re-check state before accessing transport; adds locked aborts on read/write failures
  • Adjusts main task state machine: splits abort handling, reconnect/wait logic, and server-initiated close flow; removes previous WAIT_TIMEOUT handling tied to immediate close
  • Cleans up Kconfig (drops tx lock options) and updates SPDX year

Written by Cursor Bugbot for commit 3bbe73a. This will update automatically on new commits. Configure here.

@CLAassistant
Copy link

CLAassistant commented Jan 27, 2026

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions bot changed the title fix(websocket): Fix the bug related to multi-threaded access to trans… fix(websocket): Fix the bug related to multi-threaded access to trans… (IDFGH-17155) Jan 27, 2026
@espressif-bot espressif-bot added the Status: Opened Issue is new label Jan 27, 2026
@udoudou udoudou force-pushed the bugfix/esp_websocket_client_thread_safe branch from e01a525 to d379c8c Compare January 27, 2026 11:11
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

@udoudou udoudou force-pushed the bugfix/esp_websocket_client_thread_safe branch from d379c8c to 3bbe73a Compare January 27, 2026 11:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Status: Opened Issue is new

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants