Skip to content

FATAL crash in lustre WebSocket library when socket.select returns nil recv during connection loss #2146

@alexzeitgeist

Description

@alexzeitgeist

I discovered this issue while developing a LAN driver using websockets; I assume it affects any Edge driver using lustre WebSockets where the connected device may reboot abruptly.

The lustre WebSocket library has a bug that causes a FATAL crash when the underlying socket connection is abruptly terminated (e.g., when a connected device reboots). The crash occurs in lustre's internal _receive_loop when it tries to access a nil recv value.

Steps to Reproduce

  1. Create a SmartThings Edge driver that uses lustre WebSocket for device communication
  2. Establish a WebSocket connection to a device using the standard pattern:
  local lustre = require("lustre")
  local ws = lustre.WebSocket.client(sock, path, config)
  ws:connect(host, port)
  1. While the WebSocket is connected and receiving messages, abruptly reboot the device
  2. The driver crashes with a FATAL error

Expected Behavior

The lustre library should handle connection losses gracefully, returning an error that the driver can catch and handle (e.g., trigger reconnection logic).

Actual Behavior

The driver crashes with the following FATAL error:

  FATAL Feller Wiser Gateway Driver  runtime error: [string "cosock.lua"]:250: [string "lustre/ws.lua"]:299: attempt to index a nil value (local 'recv')
  stack traceback:
      [string "lustre/ws.lua"]:299: in method '_handle_recvs'
      [string "lustre/ws.lua"]:286: in method '_receive_loop'
      [string "lustre/ws.lua"]:178: in function <[string "lustre/ws.lua"]:177>

Probable Root Cause

Looking at the lustre source code, the issue is in the _receive_loop function:

  local recv, _, err = socket.select(rs, nil, self.config._keep_alive)
  if not recv then
    if self:_handle_select_err(loop_state, err) then
      return
    end
  end
  if self:_handle_recvs(loop_state, recv, 1) then  -- Line 299: recv can be nil here!
    break
  end

The bug occurs when:

  1. socket.select() returns recv = nil due to a socket error
  2. _handle_select_err() is called but returns nil for non-timeout errors
  3. Execution continues to _handle_recvs() which tries to access recv[1] when recv is nil

Proposed Fix

Add proper control flow to prevent calling _handle_recvs when recv is nil:

  if not recv then
    if self:_handle_select_err(loop_state, err) then
      return
    end
  else  -- Only call _handle_recvs if recv is not nil
    if self:_handle_recvs(loop_state, recv, 1) then
      break
    end
  end

Environment

  • SmartThings Edge Hub 0.56.11
  • Built-in lustre library (not vendored)
  • Occurs when devices reboot

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions