Skip to content

Conversation

@pawelrutkaq
Copy link
Contributor

Notes for Reviewer

Pre-Review Checklist for the PR Author

  • Add sensible notes for the reviewer
  • PR title is short, expressive and meaningful
  • Consider switching the PR to a draft (Convert to draft)
    • as draft PR, the CI will be skipped for pushes
  • Relevant issues are linked in the References section
  • Every source code file has a copyright header with SPDX-License-Identifier: Apache-2.0 OR MIT
  • Branch follows the naming format (iox2-123-introduce-posix-ipc-example)
  • Commits messages are according to this guideline
  • Tests follow the best practice for testing
  • Changelog updated in the unreleased section including API breaking changes
  • Assign PR to reviewer
  • All checks have passed (except task-list-completed)

Checklist for the PR Reviewer

  • Commits are properly organized and messages are according to the guideline
  • Unit tests have been written for new behavior
  • Public API is documented
  • PR title describes the changes

Post-review Checklist for the PR Author

  • All open points are addressed and tracked via issues

References

Closes #1002

Signed-off-by: Pawel Rutka <[email protected]>
@pawelrutkaq pawelrutkaq force-pushed the iox2-1002-async-api-design branch 2 times, most recently from db32337 to ce027cb Compare September 3, 2025 07:30
@pawelrutkaq
Copy link
Contributor Author

@elBoberido @elfenpiff here is PR for async api we dicussed. Matrix is dead for now so I ping You here

@pawelrutkaq pawelrutkaq changed the title iox-1002: Async API design draft iox-1002: Async API design Sep 3, 2025
@pawelrutkaq pawelrutkaq changed the title iox-1002: Async API design iox-1002: Async API design draft Sep 3, 2025
@codecov
Copy link

codecov bot commented Sep 3, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.81%. Comparing base (1d3127a) to head (272f4d3).
⚠️ Report is 447 commits behind head on main.

❗ There is a different number of reports uploaded between BASE (1d3127a) and HEAD (272f4d3). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (1d3127a) HEAD (272f4d3)
1 0
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1005      +/-   ##
==========================================
- Coverage   80.03%   74.81%   -5.22%     
==========================================
  Files         258      385     +127     
  Lines       31781    42299   +10518     
  Branches        0      360     +360     
==========================================
+ Hits        25437    31648    +6211     
- Misses       6344     7705    +1361     
- Partials        0     2946    +2946     
Flag Coverage Δ
Rust 74.81% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.
see 227 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@elfenpiff elfenpiff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pawelrutkaq Thanks for writing up the async design document - it was already requested from the community right from the beginning and we are happy that you started to tackle the challenge - thank you!

I already read over it once but now I have to start sadly my meeting marathon - I left some initial questions.

@pawelrutkaq pawelrutkaq force-pushed the iox2-1002-async-api-design branch from 2d51086 to e48b7ae Compare September 3, 2025 11:44
@pawelrutkaq
Copy link
Contributor Author

@elfenpiff requests are added

@pawelrutkaq
Copy link
Contributor Author

@elBoberido @orecham kind remainder ;)

Copy link
Member

@elBoberido elBoberido left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pawelrutkaq thanks for bringing this forward and sorry for taking so long to review.

For me it's still not clear what changes in iceroyx2 would be required to implement the design. Partly also because I'm not that familiar with async and therefore some things that might be obvious for you, are unclear to me.

From my current feeling, I'm not even sure we need a port factory with a create_async method. This is something that we won't need for the other language bindings. It might be sufficient to have an AsyncListener in the userland crates, which just consumes a Listener and offers the async API.

But in order to give a better response, I need a better picture of the proposed changes and where they need to be done in iceoryx2.

Service: service::Service,
<Service::Event as iceoryx2_cal::event::Event>::Listener: FileDescriptorBased,
{
pub(crate) fn from(listener: iceoryx2::port::listener::Listener<Service>) -> Result<Self, CommonErrors> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, would it make sense to implement the TryFrom trait here? If not, it might be a good idea to call this function try_from, since it has the same signature like the one from the trait.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one of two you said makse sense. Since it's example I dont correct it for now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pawelrutkaq We have iceoryx2 users that use semaphores instead of unix domain sockets for signaling mechanism - since unix domain sockets are non-certifiable and not allowed in mission-critical contexts.

As you see here, the Listener implements only file_descriptor() when the underlying concept is file descriptor based: https://github.com/eclipse-iceoryx/iceoryx2/blob/main/iceoryx2/src/port/listener.rs#L128

When a user now uses the Semaphore variant, async cannot be implemented since Semaphores alone do not support event multiplexing.

Currently, we already have the issue, that not all iceoryx2 code compiles (waitset breaks) when the semaphore configuration is used. But we must avoid adding more code that has the same issue.

One way of addressing this would be to introduce another service variant called ipc_async_threadsafe and define the event: https://github.com/eclipse-iceoryx/iceoryx2/blob/main/iceoryx2/src/service/ipc.rs#L53 as socket based.

But this will wonderfully clash with the C, C++ and Python language bindings - so it would be a service variant that is not cross-language compatible.

@elBoberido what is your opinion here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Semaphores are not events. So in case user have system where we dont have proper notification backend (hope we dont dicuss here QNX ;)) then the only thing user can do is polling or ? In that case, using impl for specific trait (Event - FileDescriptor or Event - Semaphore) we can implement different stategy. For semaphore it would be simply timeout based polling.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Semaphores we could implement a WaitSet that works with them and for async, a separate thread could be spawned to wait on this WaitSet and the use e.g. EventFDs to wake the async runtime.

Regarding the additional service variant. I think we need a solution that scales better than what we have today. But that is not a simple task and needs some investment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makse no sense really. Waitset is doing sleep (or what have you meant?) and will add new thread. The runtimes can do the same with one liner.

Also the runtimes has own IO multiplexers, so adding new one in seperate thread witll normally cause multiple thread hops to process a change instead processing in place by runtime. So I would not do it as default solution.

Copy link
Contributor Author

@pawelrutkaq pawelrutkaq Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we dicussed today: We can write specialized handling of API when events do use Semaphore or so. And the other option is what @elBoberido mentioned, once custom waitset will be there. This would be up to Runtime to choose an integration way for this.


/// Async wait for a new [`EventId`]. On error it returns [`ListenerWaitError`] is returned which describes
/// the error in detail.
pub async fn wait_one(&self) -> Result<EventId, ListenerWaitError> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarification: Do you expect that there is a one-to-one relation between sent samples and received events?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is Event part. so here we have one to one realtions. However second part of your questions is more into pub-sub. Here our idea is:

  • first implemetation - yes, each new produced sample fires event
  • later stage (optimization when rolled out) - one can limit number of events by implemeting special checks in pub-sub internals where we can fire event only if we know async subscriber as now awaiting new sample.

Since my approach is to first do not touch iceoryx2 implemetation details.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pawelrutkaq I just skimmed over the comments - so this comment might be misplaced.

first implemetation - yes, each new produced sample fires event

This is an implementation flaw of iceoryx2 that will be fixed. The issue is here: #925 and we are currently writing a design doc for it.

The event is in the and a switch that is triggered when a state changes. If the same state changes multiple times, the switch is only triggered once until the counterpart resets it by handling all events.

This means for the following sequence, that only one event is emitted.

  1. Publish sample
  2. Notify sample-published (emits event notification)
  3. Publish sample
  4. Notify sample-published (does nothing)
  5. ....

When the listener is activated and acquires all events then the event is reseted and the event notified once again.

The reason for this behavior change is, that it is essential in a mission-critical context that no notification is lost. This can be best realized with some form of bitset - not a queue with a limited buffer size.

Copy link
Contributor Author

@pawelrutkaq pawelrutkaq Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not matter I think. So we are doing notify for each sample produced. Whether this send event, or already have send event does not matter, and is even better since we would send less events. At the end, some event was send to listener which will wake up and process all until it has no more events, correct? ;)
Speaking in code it means AsyncSubscriber doing receive().await:

  • check if there is sample, if there is, its done, next cal does same.
  • if no samples, wait for notification
  • repeat

Still i wonder what You will do, counter for each event value ? At the end, it will mean less OS calls -> better preformance in both async and non async I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As dicussed: No issue

Comment on lines +250 to +254
#### AsyncSubscriber - `messaging-patter == pubsub`

Pure `async` implementation not using any specifics of `runtime` shall be
doable. In case some unexpected dependency will be needed it has to be exposed
over defined abstraction same as `AsyncListenerTrait`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused. Further above, it is stated that pub-sub always needs to be combined with the event messaging pattern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but once You have iml AsyncListenerTrait the rest is pure async (no runtime dependent). So assuming we have somehwere type X: AsyncListenerTrait, X is becoming given and we dont depend on runtime in implementation (we only depend indirectly on concreet X type ensuring it has out trait implemented).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm not getting the point. Do you propose to add an separate AsyncSubscriber which has an integrated Listener?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the end yes, Subscriber needs event source so it would have Listener.,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed. This would be part of the userland crates

.open_or_create()
.unwrap();

let notifier = event.notifier_builder().create_async().unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me it is not clear why there should be an async notifier?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be skipped at first line since blocking is unlikelly. But to be 101% accurate, notifier is a write syscall that can block ;)

Copy link
Contributor

@elfenpiff elfenpiff Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it is not. The underlying socket is configured with fcntl(fd, F_SETFL, O_NONBLOCK) and will return an errno when it would need to block.

Let's assume the notifier has 2 listeners and of one listener, the buffer is full, then the listener is skipped and not notified, the other listener will receive a notification. If the user want to handle this error, a degradation-callback can be registered which is called whenever a delivery problem is encountered.

This is also, as contract, verified in the concept test suite so you can therefore safely assume that this is always non-blocking - if its blocking it is a bug.

Copy link
Contributor Author

@pawelrutkaq pawelrutkaq Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EWOULDBLOCK is normally translated to .await doing async waiting. Here, this can happen too. So this looks like it still shall be async API

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As dicussed: We may not need to do anything as per mentioned guarantess it will not block and it will do correct action (after reimplementation). I would leave still open for later Whether AsyncPublisher shall be created as proxy wrapper (or type alias) to Publisher to keep async API symetric only.

@pawelrutkaq pawelrutkaq force-pushed the iox2-1002-async-api-design branch from e48b7ae to a92615e Compare September 18, 2025 06:55
@pawelrutkaq
Copy link
Contributor Author

@pawelrutkaq thanks for bringing this forward and sorry for taking so long to review.

For me it's still not clear what changes in iceroyx2 would be required to implement the design. Partly also because I'm not that familiar with async and therefore some things that might be obvious for you, are unclear to me.

From my current feeling, I'm not even sure we need a port factory with a create_async method. This is something that we won't need for the other language bindings. It might be sufficient to have an AsyncListener in the userland crates, which just consumes a Listener and offers the async API.

But in order to give a better response, I need a better picture of the proposed changes and where they need to be done in iceoryx2.

My assumption is that first API can be created without changing the code, only extending is needed.
I think in case of async it may be impossible to always have same API across languages since EACH language do this differently.
At the end, I would disagree with doing all that in userland since this will limit further performance improvements and will not keep it consistent imho - and no one wants iceoryx2-tokio API and iceoryx2-other_runtime API looking similar, but actually not.

@pawelrutkaq pawelrutkaq force-pushed the iox2-1002-async-api-design branch from a92615e to c71ec25 Compare September 18, 2025 07:00
Copy link
Contributor

@elfenpiff elfenpiff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should reduce the scope of this design document to the async listener and plan it. I added some more comments but I am optimistic that we can merge the document until the end of this week.

To add further async ports we need to discuss the iceoryx2-userland layer and how there data-flow and control-flow messaging patterns are merged together which would than naturally converge to an async API.

Service: service::Service,
<Service::Event as iceoryx2_cal::event::Event>::Listener: FileDescriptorBased,
{
pub(crate) fn from(listener: iceoryx2::port::listener::Listener<Service>) -> Result<Self, CommonErrors> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pawelrutkaq We have iceoryx2 users that use semaphores instead of unix domain sockets for signaling mechanism - since unix domain sockets are non-certifiable and not allowed in mission-critical contexts.

As you see here, the Listener implements only file_descriptor() when the underlying concept is file descriptor based: https://github.com/eclipse-iceoryx/iceoryx2/blob/main/iceoryx2/src/port/listener.rs#L128

When a user now uses the Semaphore variant, async cannot be implemented since Semaphores alone do not support event multiplexing.

Currently, we already have the issue, that not all iceoryx2 code compiles (waitset breaks) when the semaphore configuration is used. But we must avoid adding more code that has the same issue.

One way of addressing this would be to introduce another service variant called ipc_async_threadsafe and define the event: https://github.com/eclipse-iceoryx/iceoryx2/blob/main/iceoryx2/src/service/ipc.rs#L53 as socket based.

But this will wonderfully clash with the C, C++ and Python language bindings - so it would be a service variant that is not cross-language compatible.

@elBoberido what is your opinion here.

loop {
match self.listener.try_wait_one() {
Ok(event) if event.is_some() => return Ok(Ok(event.unwrap())),
Ok(_) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the assumption is that when try_wait_one() returns nothing that there was an error?

This assumption would be wrong. One-to-one relations between sample and event notification will soon no longer be possible and events will have switch semantics - otherwise we have no way of dealing with limited queue buffer sizes like every mechanism like unix domain sockets has.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what that mean, we need to dicuss.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As dicussed: Not a problem, some notification will happen underneath and this is enought to get waked in async. The implemented need to take care before waiting that it consumed all events/samples etc.

Ok(event) if event.is_some() => return Ok(Ok(event.unwrap())),
Ok(_) => {
// This is None, so there was and error, probably EAGAIN or EWOULDBLOCK
if std::io::Error::last_os_error().kind() == std::io::ErrorKind::WouldBlock {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the whole error handling based on errno. This is broken since iceoryx2 resets the errno variable always to zero when it is handled - like the POSIX standard recommends.

Copy link
Contributor Author

@pawelrutkaq pawelrutkaq Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is esentially needed in FD case that we get EWouldBlock in any runtime. Otherwise we dont know when to wait. And You are not clearing this specific error ;) Anyway this information shall be preserved or we need side API.

Copy link
Contributor Author

@pawelrutkaq pawelrutkaq Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We agreed to stick to a fact that try_receive will return Ok(None) only whne there would be EWOULDBLOCK error from os call in FD case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed: We may not need to do anything as per mentioned guarantee it will not block and it will do correct action (after reimplementation). I would leave still open for later Whether AsyncPublisher shall be created as proxy wrapper (or type alias) to Publisher to keep async API symetric only.


There are two approaches that can be chosen for implementation:

#### 1. Use custom event for each pair (messaging patern, data type) based on `ServiceName`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I had more context last time, but I have currently a hard time understanding this. Let me translate this.

  1. Create corresponding event-service for each data-flow service. So when we have a service my_pub_sub_service we create also a my_pub_sub_service_events.
  2. Global event service, we use one event service like global_events and all data-flow services use that for notification.

If I am correct, could you please rewrite this a bit to make it more understandable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Yes
  2. No, but it's interesting what have you meant ;) Second is that we would use event in same service but this would mean that in that service You can only have single messaging pattern as async.
    Added examples.

beneficial to either expose some properties (of current listener) or `add` new
sync api with different signature.

#### AsyncSubscriber - `messaging-patter == pubsub`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend removing AsyncSubscriber from the design document for now. I agree that we need some kind of async subscriber, and we have this on our radar.

This would be more like a "meta-port" instead of a subscriber located in iceoryx2-userland and it combines control-flow pattern (event) with data-flow pattern (pubsub, request-response, blackboard). It would have an async API and would enable the user to wait on a wide variety of events of the data-flow messaging pattern. But on the iceoryx2 layer the async API makes, in my opinion, only sense for the listener.

Another point is that we have to always consider the full iceoryx2 API. Even when you only require an async subscriber, still an async publisher, async client, async server, async reader (blackboard), async writer (blackboard) might be missing. They might wait until there is a subscriber, or a corresponding other part.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think without AsyncSubscriber, the Listener is anyway implemented outside iceoryx2, so we could only do some minor things here. AsyncSubscriber would pave us a way to others. Once we are happy with implementation, we can add missing part of this messaging pattern and make it officially supported for Pub-Sub. Then, we can continue with others incrementally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We agreed to do AsyncSubscriber in userland.

Copy link
Member

@elBoberido elBoberido left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some small typos

Copy link
Member

@elBoberido elBoberido left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a little bit of nitpicking. Looks good. Waiting for @elfenpiff for his final review.

* limit a service to only single messaging pattern as using event will cause no
way to use it again

Considering above, the continuation of the proposal is based on option 1. The
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering option 1, one could also extend the Service Discovery with the service attributes. Please have a loot at https://github.com/eclipse-iceoryx/iceoryx2/tree/main/examples/rust/service_attributes

* If unsuitable for zero-trust or real-time use, how do we prevent accidental
misuse?

PR: To me idea, we will build async API on top of existing non-blocking, non
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpicking: Just to have this a little bit more formal :)

Suggested change
PR: To me idea, we will build async API on top of existing non-blocking, non
PR: The author proposes to build the `async` API on top of existing non-blocking, non

open a way for other to implement a bridge to other runtime like `tokio` as
basic idea will be shown

### Milestone 2 – Implement PubSub
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Milestone 2 – Implement PubSub
### Milestone 3 – Implement PubSub

@pawelrutkaq
Copy link
Contributor Author

Just a little bit of nitpicking. Looks good. Waiting for @elfenpiff for his final review.

Kk, will fix it once @elfenpiff review

@pawelrutkaq
Copy link
Contributor Author

@elfenpiff ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Async API

3 participants