Skip to content

Failure During Hardware Component Unconfiguration #1748

@sahand-ghaffari-ocado

Description

@sahand-ghaffari-ocado

Summary
Unconfiguring a hardware component using the ~/set_hardware_component_state service results in code failures. This occurs because the on_cleanup() function removes resources while the read and write functions of the hardware component are still attempting to access those resources. The issue arises because the primary state is not set to UNCONFIGURED until after the on_cleanup function is executed, rather than before it is called.

Problem Description

  1. After ~/set_hardware_component_state service call is invoked, the set_hardware_component_state_srv_cb() function is executed. This function then calls set_component_state() function. In this function, the cleanup_hardware() function is called when the target state is PRIMARY_STATE_UNCONFIGURED.
  2. The cleanup_hardware function uses the bind method to call the cleanup method of either the SystemInterface, SensorInterface, or ActuatorInterface class, depending on the type of hardware component. The cleanup() function, in turn, calls the on_cleanup function of the hardware component class. In this context, the on_cleanup function is called in the URPositionHardwareInterface class, which is defined in the Universal_Robots_ROS2_Driver repository and inherits from hardware_interface::SystemInterface.
  3. The on_cleanup function removes and unassigns pointers, as well as cleans up threads, while the read and write functions of the ControllerManager class continue running. The read and write functions in the ControllerManager class call the corresponding read and write functions in the ResourceManager class, which then invoke the read and write functions of the hardware components from the SystemInterface, SensorInterface, or ActuatorInterface classes. These functions first check if the state is PRIMARY_STATE_INACTIVE or PRIMARY_STATE_ACTIVE before executing the read and write operations on the hardware component.
  4. If the on_cleanup function is called and removes some of the resources while the state of the robot has not yet been set to UNCONFIGURED, the read and write functions of the hardware component can still be called. Since these functions attempt to access resources that have already been removed, this can result in code crashes.

Environment:

  • OS: Ubuntu 20.04
  • Version: Humble

Proposed Solution
To prevent such crashes, it's suggested to ensure that the state is properly set to UNCONFIGURED before any resources are cleaned up. This way, the read and write functions will not be invoked after resources have been removed, avoiding access to invalid or dangling pointers. Therefore, it is suggested to modify the cleanup() function in SystemInterface, SensorInterface or ActuatorInterface classes as follows

const rclcpp_lifecycle::State & System::cleanup()
{
  if (impl_->get_state().id() == lifecycle_msgs::msg::State::PRIMARY_STATE_INACTIVE)
  {
    impl_->set_state(rclcpp_lifecycle::State(
          lifecycle_msgs::msg::State::PRIMARY_STATE_UNCONFIGURED,
          lifecycle_state_names::UNCONFIGURED));
    switch (impl_->on_cleanup(impl_->get_state()))
    {
      case CallbackReturn::SUCCESS:
        break;
      case CallbackReturn::FAILURE:
      case CallbackReturn::ERROR:
        impl_->set_state(error());
        break;
    }
  }
  return impl_->get_state();
} 

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions