Skip to content

Conversation

cyan-at
Copy link
Contributor

@cyan-at cyan-at commented Jul 1, 2025

Public API Changes
None

Description
We use this package in a systemctl
When we do systemctl ... stop <our.service>, a common thing blocking shutdown is this process
We can work around it, and locally do systemctl ... stop --force but this patch isn't global and not ideal

I tested some local changes to this code to make it cooperate, without any obvious regressions

I made no issues, but maybe one exists or I can make one for tracking

@cyan-at
Copy link
Contributor Author

cyan-at commented Jul 7, 2025

@bjsowa can you give this a look? thanks

@bjsowa
Copy link
Member

bjsowa commented Jul 7, 2025

Is switching to MultiThreadedExecutor necessary? I believe we are trying to avoid it due to some existing bugs that may lead to a deadlock. (@sea-bass or @MatthijsBurgh might know more).

I think I got cooperative shutdown to work well in #1040 but it is not passing tests due to bugs in rclpy which I am trying to fix but couldn't find time to finish.

@sea-bass
Copy link
Contributor

sea-bass commented Jul 7, 2025

MultiThreadedExecutor has a lot of issues and I would avoid it unless absolutely necessary. But two suggestions:

  1. This script in question already has a bunch of command-line args. So maybe one that allows switching executors isn't a bad idea. (Side question, this really shouldn't use raw sys.argv but rather something like argparse, though I understand it gets in the way of ros2)
  2. Could always try the new EventsExecutor which is meant to significantly help with CPU usage? But I believe it's just now an experimental Python executor in Kilted.

@cyan-at
Copy link
Contributor Author

cyan-at commented Jul 8, 2025

Thanks all for the feedback

MultiThreadExecutor, + ReentrantCallbackGroup might help with some of that deadlock (handwavy gestures at threading.RLock)

But that indicates changes inside the node wherever it has such a mutexed object

It does seem necessary, without it there is some circular dependency
Maybe I can try some different ordering or something else

But if this is going to be throwaway in light of #1040, then we can put this to bed in favor of that and it's other benefits

@MatthijsBurgh
Copy link
Contributor

See #981 and ros2/rclpy#1223. This is probably still an issue.
Using the MultiThreadedExecutor is the issue. It is not related to deadlocks. But performance related.

@MatthijsBurgh
Copy link
Contributor

@sea-bass There are no problems with using argparse as long as you combine it with the remove_ros_args function from rclpy, see https://docs.ros.org/en/iron/p/rclpy/rclpy.utilities.html#rclpy.utilities.remove_ros_args

@MatthijsBurgh
Copy link
Contributor

The EventExecutor looks interesting. But don't enforce it for now.

Though I think the entire executor change in this PR is not needed to fix the shutdown issues.

@sea-bass
Copy link
Contributor

sea-bass commented Jul 8, 2025

@sea-bass There are no problems with using argparse as long as you combine it with the remove_ros_args function from rclpy, see https://docs.ros.org/en/iron/p/rclpy/rclpy.utilities.html#rclpy.utilities.remove_ros_args

Whoa, I did not know about this. Thank you!

@cyan-at
Copy link
Contributor Author

cyan-at commented Jul 8, 2025

Hi @bjsowa @MatthijsBurgh @sea-bass , I got it to work with SingleThreadExecutor

If you think the ok() check after spin_once is redundant, I can remove it

@cyan-at cyan-at requested a review from MatthijsBurgh July 8, 2025 20:24
Copy link
Contributor

@MatthijsBurgh MatthijsBurgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be possible to create a test for your case? So we have a test for a robust shutdown.

@MatthijsBurgh
Copy link
Contributor

@cyan-at Would be possible to create a test for your case? So we have a test for a robust shutdown.

@cyan-at
Copy link
Contributor Author

cyan-at commented Jul 30, 2025

@MatthijsBurgh I was looking at the existing test suites

I saw a lot of make_client but didn't see a good way to exercise this kind of shutdown in the python unittest context

Do you have any suggestions?

@cyan-at
Copy link
Contributor Author

cyan-at commented Jul 30, 2025

@MatthijsBurgh I don't think this change warrants a test, and I worry adding a test might introduce other changes out of scope

I can provide a test report i.e. before vs after behavior in this PR and I hope that would be sufficient

@MatthijsBurgh MatthijsBurgh requested a review from bjsowa August 7, 2025 08:58
@MatthijsBurgh
Copy link
Contributor

Waiting for a final review of @bjsowa. You can merge it if you approve

@bjsowa bjsowa changed the title rosbridge_websocket cooperative shutdown necessary changes fix: rosbridge_websocket cooperative shutdown Aug 7, 2025
@bjsowa bjsowa enabled auto-merge (squash) August 7, 2025 18:21
@bjsowa bjsowa merged commit cb94ab9 into RobotWebTools:ros2 Aug 7, 2025
4 checks passed
@cyan-at
Copy link
Contributor Author

cyan-at commented Aug 7, 2025

Thanks all!

bjsowa pushed a commit that referenced this pull request Aug 13, 2025
* rosbridge_websocket cooperative shutdown necessary changes

* switch back to singlethread, check for ok BEFORE spinning also

* address MatthijsBurgh comments
bjsowa pushed a commit that referenced this pull request Aug 13, 2025
* rosbridge_websocket cooperative shutdown necessary changes

* switch back to singlethread, check for ok BEFORE spinning also

* address MatthijsBurgh comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants