Skip to content

Conversation

@n8hui
Copy link

@n8hui n8hui commented May 3, 2025

Backport of the EventsExecutor (#1391) from rolling to humble. Heavily influenced by Flova's Jazzy backport (#1435)

Requires ros2/rcl#1237 backport of rcl_timer_get_next_call_time method to humble.

bmartin427 and others added 2 commits April 2, 2025 10:03
* Introduce EventsExecutor implementation (ros2#1389)

Signed-off-by: Brad Martin <[email protected]>

* Fix explosion for reference count updates without GIL

The previous version worked on 22.04+Iron, but fails on 24.04+Jazzy or Rolling.  It
seems like the behavior of std::bind() may have changed when asked to bind a py::object
to a callback taking a py::handle.

Signed-off-by: Brad Martin <[email protected]>

* Fix ament linter complaints

Signed-off-by: Brad Martin <[email protected]>

* Switch to non-boost asio library

Signed-off-by: Brad Martin <[email protected]>

* Use internal _rclpy C++ types instead of jumping through Python hoops

Signed-off-by: Brad Martin <[email protected]>

* Reformat with clang-format, then uncrustify again

clang-format fixes things that uncrustify tends to leave alone, but then uncrustify seems slightly
unhappy with that result.  Also reflow comments at 100 columns.

This uses the .clang-format file from the ament-clang-format package, with the exception of
SortIncludes being set to false, because I didn't like what it tried to do with includes without
that override.

Signed-off-by: Brad Martin <[email protected]>

* Respect init signal handling options

Signed-off-by: Brad Martin <[email protected]>

* Reconcile signal handling differences with SingleThreadedExecutor

Signed-off-by: Brad Martin <[email protected]>

* test_executor.py: Add coverage for EventsExecutor

Tests that currently work are enabled, and those that don't are annotated what needs to be done before they can be enabled

Signed-off-by: Brad Martin <[email protected]>

* Make spin_once() only return after a user-visible callback

Signed-off-by: Brad Martin <[email protected]>

* Add get_nodes() method

Signed-off-by: Brad Martin <[email protected]>

* Add context management support

Signed-off-by: Brad Martin <[email protected]>

* Remove mutex protection on nodes_ member access

It was being used only inconsistently, and on reflection it wasn't adding any protection beyond what
the GIL already provides.

Signed-off-by: Brad Martin <[email protected]>

* Fix deadlock during shutdown()

Needed to release the GIL while blocking on another lock.

Signed-off-by: Brad Martin <[email protected]>

* A little further on using C++ _rclpy types instead of Python interface

Signed-off-by: Brad Martin <[email protected]>

* Fix issue with iterating through incomplete Tasks

Never bool-test a py::object::attr() return value without an explicit py::cast<bool>.

Signed-off-by: Brad Martin <[email protected]>

* Add support for coroutines to timer handling

Signed-off-by: Brad Martin <[email protected]>

* Fix test_not_lose_callback() test to destroy entities properly

EventsExecutor was exploding because the test was leaving destroyed entities in the node by using an
incorrect API to destroy them.

Signed-off-by: Brad Martin <[email protected]>

* Correct test that wasn't running at all, and remove EventsExecutor from it

* Test methods need to be prefixed with 'test_' in order to function.  This had been entirely dead
  code, and when I enabled it EventsExecutor deadlocked.

* On reflection, it seems like a deadlock is exactly what should be expected from what this test is
  doing.  Remove EventsExecutor from the test and document the problem.

Signed-off-by: Brad Martin <[email protected]>

* Warn on every timer added over the threshold, not just the first

Co-authored-by: Janosch Machowinski <[email protected]>
Signed-off-by: Brad Martin <[email protected]>

* Keep rcl_timer_call() tightly coupled with user callback dispatch

This both prevents an issue where user callbacks could potentially queue up if they didn't dispatch
fast enough, and ensures that a timer that has passed its expiration without being dispatched yet
can still be canceled without the user callback being delivered later.

This commit also makes use of the new rcl API for querying the absolute timer expiration time,
instead of the relative number of nanoseconds remaining until it expires.  This should both make
things more accurate, and potentially reduce overhead as we don't have to re-query the current time
for each outstanding timer.

Signed-off-by: Brad Martin <[email protected]>

* Protect against deferred method calls happening against a deleted ClockManager

Signed-off-by: Brad Martin <[email protected]>

* Add support for new TimerInfo data passed to timer handlers

Signed-off-by: Brad Martin <[email protected]>

* Simplify spin_once() implementation

This both reduces duplicate code now, and simplifies the asio interface used which would need
replacing.

Signed-off-by: Brad Martin <[email protected]>

* Fix stale Future done callbacks with spin_until_future_complete()

This method can't be allowed to leave its Future done callback outstanding in case the method is
returning for a reason other than the Future being done.

Signed-off-by: Brad Martin <[email protected]>

* Use existing rclpy signal handling instead of asio

Signed-off-by: Brad Martin <[email protected]>

* Replace asio timers with a dedicated timer wait thread

This is dumb on its own, but it helps me move towards eliminating asio.

Signed-off-by: Brad Martin <[email protected]>

* Correct busy-looping in async callback handling

This isn't ideal because there are some ways async callbacks could become unblocked which wouldn't
get noticed right away (if at all); however this seems to match the behavior of
SingleThreadedExecutor.

Signed-off-by: Brad Martin <[email protected]>

* Replace asio::io_context with a new EventsQueue object

Signed-off-by: Brad Martin <[email protected]>

* Add EventsExecutor to new test_executor test from merge

Signed-off-by: Brad Martin <[email protected]>

* Swap 'pragma once' for ifndef include guards

Signed-off-by: Brad Martin <[email protected]>

* Add test coverage for Node.destroy_subscription()

Signed-off-by: Brad Martin <[email protected]>

* Replace '|' type markup with typing.Optional and typing.Union

Python 3.9, still used by RHEL 9, doesn't seem to understand '|' syntax, and
Optional/Union seems to be what gets used throughout the rest of the codebase.

Additionally, fix a couple other mypy nits:
 * mypy is mad that I haven't explicitly annotated every __init__ as returning None
 * mypy wants generic arguments to service and action clients now

Signed-off-by: Brad Martin <[email protected]>

* Use 'auto' in place of explicit return type on hash

pybind11::hash() is documented as returning ssize_t, but this seems to be a lie because MSVC
doesn't understand that type; so, let's just return whatever we do get.

Signed-off-by: Brad Martin <[email protected]>

* Change initialization of struct members

MSVC didn't like the more concise method.

Signed-off-by: Brad Martin <[email protected]>

* Use subTest in test_executor to distinguish which executor type failed

Signed-off-by: Brad Martin <[email protected]>

* Use time.perf_counter() instead of time.monotonic() in executor test

time.monotonic() has a resolution of 16ms, which is way too coarse for the intervals
this test is trying to measure.

Signed-off-by: Brad Martin <[email protected]>

---------

Signed-off-by: Brad Martin <[email protected]>
Signed-off-by: Brad Martin <[email protected]>
Co-authored-by: Brad Martin <[email protected]>
Co-authored-by: Janosch Machowinski <[email protected]>
- Generics have been added to some parts of rclpy in a later version. Therefore, we can not use them, and they were removed from the affected type hints.
- The TimerInfo argument is not present in humble. Anything related to it is removed.
- The execution order of the SingleThreadedExecutor is not fifo in humble. This has been fixed in rolling, but the PR will not be backported, as this would be a breaking change. Therefore, a special case is added to one test that flips the order so we expect a fifo behavior for the events executor.
- Subscription CallbackType enum is not present in humble, references to it have been removed.
- The rcl function rcl_timer_set_on_reset_callback is not present in humble, removed timer reset callback from events executor.
- The rclpy event_handler submodule is not implemented in humble, removed event callback checks from events executor test.
- Replaced references to submodule rclpy clock_type with clock in events executor test.
- Manually wake executor in future done callback for spin_once_until_future_complete tests (method differs from Jazzy/Rolling).

Signed-off-by: Nathan Hui <[email protected]>
@alsora
Copy link

alsora commented May 16, 2025

Thank you @n8hui for this work!
We should make sure that these changes don't break API/ABI.

It shouldn't be a problem since it's mostly new files, but we should look at the signal handler code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants