Skip to content

Conversation

@titusfortner
Copy link
Member

@titusfortner titusfortner commented Oct 22, 2025

User description

🔗 Related Issues

In the process of this PR - #16476 the (jruby) RBE keeps giving errors. These additions are to minimize IOError during shutdown.

💥 What does this PR do?

  • Changed the order of shutdown actions in bridge and bidi bridge
  • Adds more connection errors to ignore in socket
  • Adds mutex around all the places that may need synchronizing
  • Adds a @closing variable to stop doing things if we want to shut down
  • Joining threads instead of force exiting threads
  • Setting both #abort_on_exception and report_on_exception to false and specifying what we want in the rescue

🔧 Implementation Notes

ngl, I'm not certain we need all of this, but jruby is very picky.
I spent a lot of time consulting/arguing with several LLMs about this and how it fits into where I think we need to go, and learning a lot more about threads and sockets and async than I actually want to know. I'm much more confident about where this sits now.

The main issue is that propagating errors to the main thread isn't a good idea. If the main thread is in an unrelated ruby block when it is interrupted, that block could intercept the standard error unintentionally. There are ways we can manage propagating errors, but this way can cause issues.
Of note Playwright does not throw uncaught exceptions in Python or Java; (just JS where it seems like it's the only option).
This code checks if it is a bidi implementation and just logs the error, which I think is what we want the default to be right now.
I'm keeping existing devtools behavior as-is for backwards compatibility and flagging it as deprecated. Is that too much? Should we just move to the safer logging-only behavior right away?

We can do something safer when we move to a slightly different implementation that gives users more control over desired behavior.

🔄 Types of changes

  • Bug fix (backwards compatible)

PR Type

Bug fix


Description

  • Add synchronization with mutex around callback and message operations

  • Expand connection error handling to include EBADF, IOError, EOFError

  • Implement graceful shutdown with @closing flag and thread joining

  • Improve error logging and exception handling in socket listener

  • Reorder quit operations to close bidi before HTTP connection


Diagram Walkthrough

flowchart LR
  A["WebSocket Connection"] -->|add mutex| B["Thread-safe operations"]
  A -->|expand errors| C["CONNECTION_ERRORS list"]
  A -->|graceful shutdown| D["@closing flag + join threads"]
  B -->|protect| E["Callbacks & Messages"]
  C -->|handle| F["EBADF, IOError, EOFError"]
  D -->|replace| G["thread.exit calls"]
  H["Bridge quit"] -->|reorder| I["close bidi first"]
  I -->|then| J["close HTTP"]
Loading

File Walkthrough

Relevant files
Bug fix
websocket_connection.rb
Add mutex synchronization and graceful shutdown handling 

rb/lib/selenium/webdriver/common/websocket_connection.rb

  • Added Errno::EBADF, IOError, and EOFError to CONNECTION_ERRORS list
    for comprehensive error handling
  • Introduced @mtx Mutex and @closing flag for thread-safe
    synchronization
  • Implemented graceful shutdown using thread.join(0.5) instead of
    thread.exit
  • Protected callback and message operations with mutex synchronization
  • Enhanced error logging with debug messages for connection closures and
    callback failures
  • Improved socket.write error handling with try-catch for
    CONNECTION_ERRORS
  • Changed process_handshake to loop until ws.finished? for complete
    handshake
  • Modified attach_socket_listener to use @closing flag and loop instead
    of socket.eof?
  • Updated callback_thread to set abort_on_exception and
    report_on_exception to false with explicit error logging
+65/-35 
bidi_bridge.rb
Reorder quit to close bidi before parent quit                       

rb/lib/selenium/webdriver/remote/bidi_bridge.rb

  • Reordered quit method to close bidi connection before calling super
  • Added rescue clause for QUIT_ERRORS to handle connection errors
    gracefully
  • Changed from ensure block to explicit error handling pattern
+3/-2     
bridge.rb
Expand quit errors and improve close handling                       

rb/lib/selenium/webdriver/remote/bridge.rb

  • Expanded QUIT_ERRORS to include EOFError and WebSocket::Error
  • Changed quit method to use ensure block for http.close instead of
    rescue
  • Wrapped http.close in try-catch for QUIT_ERRORS handling
+7/-4     

@titusfortner titusfortner requested a review from p0deje October 22, 2025 07:29
@selenium-ci selenium-ci added the C-rb Ruby Bindings label Oct 22, 2025
@qodo-merge-pro
Copy link
Contributor

qodo-merge-pro bot commented Oct 22, 2025

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
Potential hang

Description: The socket listener loop reads from the socket without a timeout or check for liveness
beyond @closing, which can block indefinitely and potentially hang shutdown if @closing is
never set due to a race; consider using non-blocking reads or timeouts.
websocket_connection.rb [121-143]

Referred Code
Thread.new do
  Thread.current.report_on_exception = false

  loop do
    break if @closing

    incoming_frame << socket.readpartial(1024)

    while (frame = incoming_frame.next)
      break if @closing

      message = process_frame(frame)
      next unless message['method']

      params = message['params']
      @mtx.synchronize { callbacks[message['method']].dup }.each do |callback|
        @callback_threads.add(callback_thread(params, &callback))
      end
    end
  end
rescue *CONNECTION_ERRORS, WebSocket::Error => e


 ... (clipped 2 lines)
Ticket Compliance
🟡
🎫 #1234
🔴 Provide a change that restores the alert/JS execution behavior when clicking such links.
Investigate and fix a regression where clicking a link with JavaScript in href no longer
triggers the JS in Selenium 2.48.x (worked in 2.47.1) on Firefox 42.
Validate behavior specifically in Firefox context labeled in the ticket.
🟡
🎫 #5678
🟢 Provide robust connection handling/logging to diagnose or prevent repeated connection
failures.
Address recurring "Error: ConnectFailure (Connection refused)" when instantiating multiple
ChromeDriver instances on Ubuntu 16.04 with Chrome 65 and ChromeDriver 2.35 using Selenium
3.9.0.
Ensure subsequent ChromeDriver instantiations do not fail with connection refused after
the first instance.
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
No custom compliance provided

Follow the guide to enable custom compliance check.

  • Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-merge-pro
Copy link
Contributor

qodo-merge-pro bot commented Oct 22, 2025

PR Code Suggestions ✨

Latest suggestions up to 627943c

CategorySuggestion                                                                                                                                    Impact
Possible issue
Set closing flag on listener exit

In attach_socket_listener, use an ensure block to set @closing = true when the
listener loop exits, including on error, to ensure a clean shutdown.

rb/lib/selenium/webdriver/common/websocket_connection.rb [128-152]

 def attach_socket_listener
   Thread.new do
     Thread.current.report_on_exception = false
 
-    loop do
-      break if @closing
-
-      incoming_frame << socket.readpartial(1024)
-
-      while (frame = incoming_frame.next)
+    begin
+      loop do
         break if @closing
 
-        message = process_frame(frame)
-        next unless message['method']
+        incoming_frame << socket.readpartial(1024)
 
-        params = message['params']
-        @messages_mtx.synchronize { callbacks[message['method']].dup }.each do |callback|
-          @callback_threads.add(callback_thread(params, &callback))
+        while (frame = incoming_frame.next)
+          break if @closing
+
+          message = process_frame(frame)
+          next unless message['method']
+
+          params = message['params']
+          @callbacks_mtx.synchronize { callbacks[message['method']].dup }.each do |callback|
+            @callback_threads.add(callback_thread(params, &callback))
+          end
         end
       end
+    rescue *CONNECTION_ERRORS, WebSocket::Error => e
+      WebDriver.logger.debug "WebSocket listener closed: #{e.class}: #{e.message}", id: :ws
+    ensure
+      @closing_mtx.synchronize { @closing = true }
     end
-  rescue *CONNECTION_ERRORS, WebSocket::Error => e
-    WebDriver.logger.debug "WebSocket listener closed: #{e.class}: #{e.message}", id: :ws
   end
 end
  • Apply / Chat
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly points out that an error in the socket listener should set the @closing flag to signal other threads that the connection is dead, preventing them from waiting indefinitely.

Medium
Enforce safe response timeout

In send_cmd, replace wait.until with a custom wait loop that checks the @closing
flag to avoid waiting for a response when the connection is being terminated.

rb/lib/selenium/webdriver/common/websocket_connection.rb [99-115]

 def send_cmd(**payload)
   id = next_id
   data = payload.merge(id: id)
   WebDriver.logger.debug "WebSocket -> #{data}"[...MAX_LOG_MESSAGE_SIZE], id: :ws
   data = JSON.generate(data)
   out_frame = WebSocket::Frame::Outgoing::Client.new(version: ws.version, data: data, type: 'text')
 
   begin
     socket.write(out_frame.to_s)
   rescue *CONNECTION_ERRORS => e
-    raise e, "WebSocket is closed (#{e.class}: #{e.message})"
+    raise Error::WebDriverError, "WebSocket is closed (#{e.class}: #{e.message})"
   end
 
-  wait.until do
-    @messages_mtx.synchronize { messages.delete(id) }
+  deadline = Time.now + RESPONSE_WAIT_TIMEOUT
+  loop do
+    break if @closing
+    found = @messages_mtx.synchronize { messages.delete(id) }
+    return found if found
+    if Time.now > deadline
+      raise Error::WebDriverError, "Timed out waiting for WebSocket response to command id=#{id}"
+    end
+    sleep RESPONSE_WAIT_INTERVAL
   end
+
+  raise Error::WebDriverError, "Connection closing while waiting for response to command id=#{id}"
 end
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that the wait.until block can wait for the full timeout even if the connection is closing, and proposes adding a check for the @closing flag to fail faster, which improves robustness.

Medium
General
Handle malformed frames safely

In process_frame, wrap JSON.parse in a begin/rescue block to handle
JSON::ParserError, preventing malformed data from crashing the socket listener
thread.

rb/lib/selenium/webdriver/common/websocket_connection.rb [158-169]

 def process_frame(frame)
-  message = frame.to_s
+  raw = frame.to_s
 
   # Firefox will periodically fail on unparsable empty frame
-  return {} if message.empty?
+  return {} if raw.nil? || raw.empty?
 
-  msg = JSON.parse(message)
-  @messages_mtx.synchronize { messages[msg['id']] = msg if msg.key?('id') }
+  begin
+    msg = JSON.parse(raw)
+  rescue JSON::ParserError => e
+    WebDriver.logger.debug "Ignoring unparsable WebSocket frame: #{e.class}: #{e.message}", id: :ws
+    return {}
+  end
+
+  if msg.key?('id')
+    @messages_mtx.synchronize { messages[msg['id']] = msg }
+  end
 
   WebDriver.logger.debug "WebSocket <- #{msg}"[...MAX_LOG_MESSAGE_SIZE], id: :ws
   msg
 end

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies that a JSON::ParserError from a malformed frame would crash the listener thread, and proposes adding error handling to make the connection more resilient.

Medium
Learned
best practice
Guard message fields before use

Add defensive checks to ensure message['params'] is a Hash before use and safely
duplicate callbacks to prevent nil errors and races when handlers are removed
concurrently.

rb/lib/selenium/webdriver/common/websocket_connection.rb [137-147]

 while (frame = incoming_frame.next)
   break if @closing
 
   message = process_frame(frame)
-  next unless message['method']
+  method = message['method']
+  next unless method.is_a?(String)
 
   params = message['params']
-  @messages_mtx.synchronize { callbacks[message['method']].dup }.each do |callback|
+  next unless params.is_a?(Hash)
+
+  handlers = @callbacks_mtx.synchronize { callbacks[method].dup }
+  handlers.each do |callback|
     @callback_threads.add(callback_thread(params, &callback))
   end
 end
  • Apply / Chat
Suggestion importance[1-10]: 6

__

Why:
Relevant best practice - Validate inputs and states early to avoid nil errors; guard message handling and callback spawning against missing fields.

Low
  • Update

Previous suggestions

✅ Suggestions up to commit 4292334
CategorySuggestion                                                                                                                                    Impact
High-level
Refactor to use multiple specialized mutexes

Replace the single, coarse-grained mutex with multiple, more granular mutexes,
one for each shared resource (@callbacks, @messages, @closing). This will reduce
lock contention and improve concurrency.

Examples:

rb/lib/selenium/webdriver/common/websocket_connection.rb [40]
        @mtx = Mutex.new
rb/lib/selenium/webdriver/common/websocket_connection.rb [77-80]
        @mtx.synchronize do
          callbacks[event] << block
          block.object_id
        end

Solution Walkthrough:

Before:

class WebSocketConnection
  def initialize(url:)
    @mtx = Mutex.new
    @closing = false
    @callbacks = {}
    @messages = {}
    # ...
  end

  def add_callback(event, &block)
    @mtx.synchronize do
      # ... access @callbacks
    end
  end

  def send_cmd(**payload)
    # ...
    wait.until do
      @mtx.synchronize { messages.delete(id) }
    end
  end

  def process_frame(frame)
    # ...
    @mtx.synchronize { messages[msg['id']] = msg }
    # ...
  end
end

After:

class WebSocketConnection
  def initialize(url:)
    @callbacks_mtx = Mutex.new
    @messages_mtx = Mutex.new
    @closing_mtx = Mutex.new
    @closing = false
    @callbacks = {}
    @messages = {}
    # ...
  end

  def add_callback(event, &block)
    @callbacks_mtx.synchronize do
      # ... access @callbacks
    end
  end

  def send_cmd(**payload)
    # ...
    wait.until do
      @messages_mtx.synchronize { messages.delete(id) }
    end
  end

  def process_frame(frame)
    # ...
    @messages_mtx.synchronize { messages[msg['id']] = msg }
    # ...
  end
end
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies that a single coarse-grained mutex is used to protect multiple independent resources, which can cause unnecessary lock contention and performance issues. Adopting a more granular locking strategy is a significant architectural improvement for the concurrency model introduced in this PR.

Medium
Possible issue
Ensure session is always deleted
Suggestion Impact:The commit changed the quit method so that failures from bidi.close are rescued and super is guaranteed to run via an ensure block, ensuring session cleanup. Although it used an ensure block instead of wrapping only bidi.close in a begin/rescue, it achieved the same goal of always calling super even if bidi.close fails.

code diff:

         def quit
           bidi.close
-          super
         rescue *QUIT_ERRORS
           nil
+        ensure
+          super
         end

Modify the quit method to ensure super is always called, even if bidi.close
fails. Wrap only the bidi.close call in a begin/rescue block to prevent leaving
orphaned browser sessions.

rb/lib/selenium/webdriver/remote/bidi_bridge.rb [48-53]

 def quit
-  bidi.close
+  begin
+    bidi.close
+  rescue *QUIT_ERRORS
+    nil
+  end
   super
-rescue *QUIT_ERRORS
-  nil
 end

[Suggestion processed]

Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a critical flaw where a failure in bidi.close would prevent the super call from executing, potentially leaving browser sessions running. The proposed fix ensures the session is always cleaned up.

Medium
Fix race condition in callback removal
Suggestion Impact:The commit changed remove_callback to wrap the closing check, removal, and ID collection within a single @callbacks_mtx.synchronize block, addressing the race condition as suggested.

code diff:

       def remove_callback(event, id)
-        removed = @mtx.synchronize { callbacks[event].reject! { |cb| cb.object_id == id } }
-        return if removed || @closing
-
-        ids = @mtx.synchronize { callbacks[event]&.map(&:object_id) }
-        raise Error::WebDriverError, "Callback with ID #{id} does not exist for event #{event}: #{ids}"
+        @callbacks_mtx.synchronize do
+          return if @closing
+
+          callbacks_for_event = callbacks[event]
+          return if callbacks_for_event.reject! { |cb| cb.object_id == id }
+
+          ids = callbacks[event]&.map(&:object_id)
+          raise Error::WebDriverError, "Callback with ID #{id} does not exist for event #{event}: #{ids}"
+        end
       end

Refactor the remove_callback method to use a single synchronize block. This will
prevent a race condition by ensuring the callback removal and error message data
collection are performed atomically.

rb/lib/selenium/webdriver/common/websocket_connection.rb [84-88]

-removed = @mtx.synchronize { callbacks[event].reject! { |cb| cb.object_id == id } }
-return if removed || @closing
+@mtx.synchronize do
+  return if @closing
 
-ids = @mtx.synchronize { callbacks[event]&.map(&:object_id) }
-raise Error::WebDriverError, "Callback with ID #{id} does not exist for event #{event}: #{ids}"
+  callbacks_for_event = callbacks[event]
+  if callbacks_for_event.reject! { |cb| cb.object_id == id }
+    return
+  end
 
+  ids = callbacks_for_event.map(&:object_id)
+  raise Error::WebDriverError, "Callback with ID #{id} does not exist for event #{event}: #{ids}"
+end
+

[Suggestion processed]

Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a race condition in the remove_callback method where two separate synchronize blocks could lead to inconsistent state, and proposes a valid fix by using a single atomic block.

Medium
Learned
best practice
Make handshake loop safe

Guard the handshake read loop against connection closure and IO errors so
resources don't hang if the socket closes mid-handshake.

rb/lib/selenium/webdriver/common/websocket_connection.rb [115-118]

 def process_handshake
   socket.print(ws.to_s)
-  ws << socket.readpartial(1024) until ws.finished?
+  begin
+    ws << socket.readpartial(1024) until ws.finished?
+  rescue *CONNECTION_ERRORS, WebSocket::Error, EOFError, IOError
+    # let caller handle a failed/closed handshake deterministically
+    raise Error::WebDriverError, 'WebSocket handshake failed or connection closed'
+  end
 end
Suggestion importance[1-10]: 6

__

Why:
Relevant best practice - Enforce deterministic and safe resource handling by ensuring sockets/threads are always cleaned up on all code paths.

Low

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds thread safety and improved error handling to WebSocket connection management in the Ruby Selenium bindings to prevent IOError exceptions during JRuby shutdown. The changes focus on synchronizing concurrent access to shared resources, gracefully handling connection errors, and properly coordinating shutdown sequences.

Key changes:

  • Added mutex synchronization around all WebSocket callback and message operations
  • Expanded connection error handling to include EBADF, IOError, and EOFError
  • Implemented graceful shutdown using thread joining instead of forced exits

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
websocket_connection.rb Added mutex synchronization, graceful shutdown with @closing flag, and enhanced error handling for socket operations
bidi_bridge.rb Reordered quit sequence to close BiDi connection before parent HTTP connection
bridge.rb Expanded QUIT_ERRORS list and improved error handling in quit method
test_environment.rb Added --disable-dev-shm-usage flag for Chrome/Edge when running on RBE
driver_spec.rb Simplified test exclusion metadata

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants