-
Notifications
You must be signed in to change notification settings - Fork 98
iox-1002: Async API design draft #1005
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
iox-1002: Async API design draft #1005
Conversation
Signed-off-by: Pawel Rutka <[email protected]>
db32337 to
ce027cb
Compare
|
@elBoberido @elfenpiff here is PR for async api we dicussed. Matrix is dead for now so I ping You here |
Codecov Report✅ All modified and coverable lines are covered by tests.
Additional details and impacted files@@ Coverage Diff @@
## main #1005 +/- ##
==========================================
- Coverage 80.03% 74.81% -5.22%
==========================================
Files 258 385 +127
Lines 31781 42299 +10518
Branches 0 360 +360
==========================================
+ Hits 25437 31648 +6211
- Misses 6344 7705 +1361
- Partials 0 2946 +2946
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
elfenpiff
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pawelrutkaq Thanks for writing up the async design document - it was already requested from the community right from the beginning and we are happy that you started to tackle the challenge - thank you!
I already read over it once but now I have to start sadly my meeting marathon - I left some initial questions.
2d51086 to
e48b7ae
Compare
|
@elfenpiff requests are added |
|
@elBoberido @orecham kind remainder ;) |
elBoberido
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pawelrutkaq thanks for bringing this forward and sorry for taking so long to review.
For me it's still not clear what changes in iceroyx2 would be required to implement the design. Partly also because I'm not that familiar with async and therefore some things that might be obvious for you, are unclear to me.
From my current feeling, I'm not even sure we need a port factory with a create_async method. This is something that we won't need for the other language bindings. It might be sufficient to have an AsyncListener in the userland crates, which just consumes a Listener and offers the async API.
But in order to give a better response, I need a better picture of the proposed changes and where they need to be done in iceoryx2.
| Service: service::Service, | ||
| <Service::Event as iceoryx2_cal::event::Event>::Listener: FileDescriptorBased, | ||
| { | ||
| pub(crate) fn from(listener: iceoryx2::port::listener::Listener<Service>) -> Result<Self, CommonErrors> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, would it make sense to implement the TryFrom trait here? If not, it might be a good idea to call this function try_from, since it has the same signature like the one from the trait.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one of two you said makse sense. Since it's example I dont correct it for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pawelrutkaq We have iceoryx2 users that use semaphores instead of unix domain sockets for signaling mechanism - since unix domain sockets are non-certifiable and not allowed in mission-critical contexts.
As you see here, the Listener implements only file_descriptor() when the underlying concept is file descriptor based: https://github.com/eclipse-iceoryx/iceoryx2/blob/main/iceoryx2/src/port/listener.rs#L128
When a user now uses the Semaphore variant, async cannot be implemented since Semaphores alone do not support event multiplexing.
Currently, we already have the issue, that not all iceoryx2 code compiles (waitset breaks) when the semaphore configuration is used. But we must avoid adding more code that has the same issue.
One way of addressing this would be to introduce another service variant called ipc_async_threadsafe and define the event: https://github.com/eclipse-iceoryx/iceoryx2/blob/main/iceoryx2/src/service/ipc.rs#L53 as socket based.
But this will wonderfully clash with the C, C++ and Python language bindings - so it would be a service variant that is not cross-language compatible.
@elBoberido what is your opinion here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Semaphores are not events. So in case user have system where we dont have proper notification backend (hope we dont dicuss here QNX ;)) then the only thing user can do is polling or ? In that case, using impl for specific trait (Event - FileDescriptor or Event - Semaphore) we can implement different stategy. For semaphore it would be simply timeout based polling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Semaphores we could implement a WaitSet that works with them and for async, a separate thread could be spawned to wait on this WaitSet and the use e.g. EventFDs to wake the async runtime.
Regarding the additional service variant. I think we need a solution that scales better than what we have today. But that is not a simple task and needs some investment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makse no sense really. Waitset is doing sleep (or what have you meant?) and will add new thread. The runtimes can do the same with one liner.
Also the runtimes has own IO multiplexers, so adding new one in seperate thread witll normally cause multiple thread hops to process a change instead processing in place by runtime. So I would not do it as default solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we dicussed today: We can write specialized handling of API when events do use Semaphore or so. And the other option is what @elBoberido mentioned, once custom waitset will be there. This would be up to Runtime to choose an integration way for this.
|
|
||
| /// Async wait for a new [`EventId`]. On error it returns [`ListenerWaitError`] is returned which describes | ||
| /// the error in detail. | ||
| pub async fn wait_one(&self) -> Result<EventId, ListenerWaitError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarification: Do you expect that there is a one-to-one relation between sent samples and received events?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is Event part. so here we have one to one realtions. However second part of your questions is more into pub-sub. Here our idea is:
- first implemetation - yes, each new produced sample fires event
- later stage (optimization when rolled out) - one can limit number of events by implemeting special checks in pub-sub internals where we can fire event only if we know async subscriber as now awaiting new sample.
Since my approach is to first do not touch iceoryx2 implemetation details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pawelrutkaq I just skimmed over the comments - so this comment might be misplaced.
first implemetation - yes, each new produced sample fires event
This is an implementation flaw of iceoryx2 that will be fixed. The issue is here: #925 and we are currently writing a design doc for it.
The event is in the and a switch that is triggered when a state changes. If the same state changes multiple times, the switch is only triggered once until the counterpart resets it by handling all events.
This means for the following sequence, that only one event is emitted.
- Publish sample
- Notify sample-published (emits event notification)
- Publish sample
- Notify sample-published (does nothing)
- ....
When the listener is activated and acquires all events then the event is reseted and the event notified once again.
The reason for this behavior change is, that it is essential in a mission-critical context that no notification is lost. This can be best realized with some form of bitset - not a queue with a limited buffer size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not matter I think. So we are doing notify for each sample produced. Whether this send event, or already have send event does not matter, and is even better since we would send less events. At the end, some event was send to listener which will wake up and process all until it has no more events, correct? ;)
Speaking in code it means AsyncSubscriber doing receive().await:
- check if there is sample, if there is, its done, next cal does same.
- if no samples, wait for notification
- repeat
Still i wonder what You will do, counter for each event value ? At the end, it will mean less OS calls -> better preformance in both async and non async I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As dicussed: No issue
| #### AsyncSubscriber - `messaging-patter == pubsub` | ||
|
|
||
| Pure `async` implementation not using any specifics of `runtime` shall be | ||
| doable. In case some unexpected dependency will be needed it has to be exposed | ||
| over defined abstraction same as `AsyncListenerTrait` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused. Further above, it is stated that pub-sub always needs to be combined with the event messaging pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but once You have iml AsyncListenerTrait the rest is pure async (no runtime dependent). So assuming we have somehwere type X: AsyncListenerTrait, X is becoming given and we dont depend on runtime in implementation (we only depend indirectly on concreet X type ensuring it has out trait implemented).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm not getting the point. Do you propose to add an separate AsyncSubscriber which has an integrated Listener?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the end yes, Subscriber needs event source so it would have Listener.,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed. This would be part of the userland crates
| .open_or_create() | ||
| .unwrap(); | ||
|
|
||
| let notifier = event.notifier_builder().create_async().unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me it is not clear why there should be an async notifier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be skipped at first line since blocking is unlikelly. But to be 101% accurate, notifier is a write syscall that can block ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it is not. The underlying socket is configured with fcntl(fd, F_SETFL, O_NONBLOCK) and will return an errno when it would need to block.
Let's assume the notifier has 2 listeners and of one listener, the buffer is full, then the listener is skipped and not notified, the other listener will receive a notification. If the user want to handle this error, a degradation-callback can be registered which is called whenever a delivery problem is encountered.
This is also, as contract, verified in the concept test suite so you can therefore safely assume that this is always non-blocking - if its blocking it is a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EWOULDBLOCK is normally translated to .await doing async waiting. Here, this can happen too. So this looks like it still shall be async API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As dicussed: We may not need to do anything as per mentioned guarantess it will not block and it will do correct action (after reimplementation). I would leave still open for later Whether AsyncPublisher shall be created as proxy wrapper (or type alias) to Publisher to keep async API symetric only.
Signed-off-by: Pawel Rutka <[email protected]>
e48b7ae to
a92615e
Compare
My assumption is that first API can be created without changing the code, only extending is needed. |
a92615e to
c71ec25
Compare
elfenpiff
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should reduce the scope of this design document to the async listener and plan it. I added some more comments but I am optimistic that we can merge the document until the end of this week.
To add further async ports we need to discuss the iceoryx2-userland layer and how there data-flow and control-flow messaging patterns are merged together which would than naturally converge to an async API.
| Service: service::Service, | ||
| <Service::Event as iceoryx2_cal::event::Event>::Listener: FileDescriptorBased, | ||
| { | ||
| pub(crate) fn from(listener: iceoryx2::port::listener::Listener<Service>) -> Result<Self, CommonErrors> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pawelrutkaq We have iceoryx2 users that use semaphores instead of unix domain sockets for signaling mechanism - since unix domain sockets are non-certifiable and not allowed in mission-critical contexts.
As you see here, the Listener implements only file_descriptor() when the underlying concept is file descriptor based: https://github.com/eclipse-iceoryx/iceoryx2/blob/main/iceoryx2/src/port/listener.rs#L128
When a user now uses the Semaphore variant, async cannot be implemented since Semaphores alone do not support event multiplexing.
Currently, we already have the issue, that not all iceoryx2 code compiles (waitset breaks) when the semaphore configuration is used. But we must avoid adding more code that has the same issue.
One way of addressing this would be to introduce another service variant called ipc_async_threadsafe and define the event: https://github.com/eclipse-iceoryx/iceoryx2/blob/main/iceoryx2/src/service/ipc.rs#L53 as socket based.
But this will wonderfully clash with the C, C++ and Python language bindings - so it would be a service variant that is not cross-language compatible.
@elBoberido what is your opinion here.
| loop { | ||
| match self.listener.try_wait_one() { | ||
| Ok(event) if event.is_some() => return Ok(Ok(event.unwrap())), | ||
| Ok(_) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the assumption is that when try_wait_one() returns nothing that there was an error?
This assumption would be wrong. One-to-one relations between sample and event notification will soon no longer be possible and events will have switch semantics - otherwise we have no way of dealing with limited queue buffer sizes like every mechanism like unix domain sockets has.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what that mean, we need to dicuss.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As dicussed: Not a problem, some notification will happen underneath and this is enought to get waked in async. The implemented need to take care before waiting that it consumed all events/samples etc.
| Ok(event) if event.is_some() => return Ok(Ok(event.unwrap())), | ||
| Ok(_) => { | ||
| // This is None, so there was and error, probably EAGAIN or EWOULDBLOCK | ||
| if std::io::Error::last_os_error().kind() == std::io::ErrorKind::WouldBlock { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the whole error handling based on errno. This is broken since iceoryx2 resets the errno variable always to zero when it is handled - like the POSIX standard recommends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is esentially needed in FD case that we get EWouldBlock in any runtime. Otherwise we dont know when to wait. And You are not clearing this specific error ;) Anyway this information shall be preserved or we need side API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We agreed to stick to a fact that try_receive will return Ok(None) only whne there would be EWOULDBLOCK error from os call in FD case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed: We may not need to do anything as per mentioned guarantee it will not block and it will do correct action (after reimplementation). I would leave still open for later Whether AsyncPublisher shall be created as proxy wrapper (or type alias) to Publisher to keep async API symetric only.
|
|
||
| There are two approaches that can be chosen for implementation: | ||
|
|
||
| #### 1. Use custom event for each pair (messaging patern, data type) based on `ServiceName` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I had more context last time, but I have currently a hard time understanding this. Let me translate this.
- Create corresponding event-service for each data-flow service. So when we have a service
my_pub_sub_servicewe create also amy_pub_sub_service_events. - Global event service, we use one event service like
global_eventsand all data-flow services use that for notification.
If I am correct, could you please rewrite this a bit to make it more understandable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Yes
- No, but it's interesting what have you meant ;) Second is that we would use event in same service but this would mean that in that service You can only have single messaging pattern as async.
Added examples.
| beneficial to either expose some properties (of current listener) or `add` new | ||
| sync api with different signature. | ||
|
|
||
| #### AsyncSubscriber - `messaging-patter == pubsub` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend removing AsyncSubscriber from the design document for now. I agree that we need some kind of async subscriber, and we have this on our radar.
This would be more like a "meta-port" instead of a subscriber located in iceoryx2-userland and it combines control-flow pattern (event) with data-flow pattern (pubsub, request-response, blackboard). It would have an async API and would enable the user to wait on a wide variety of events of the data-flow messaging pattern. But on the iceoryx2 layer the async API makes, in my opinion, only sense for the listener.
Another point is that we have to always consider the full iceoryx2 API. Even when you only require an async subscriber, still an async publisher, async client, async server, async reader (blackboard), async writer (blackboard) might be missing. They might wait until there is a subscriber, or a corresponding other part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think without AsyncSubscriber, the Listener is anyway implemented outside iceoryx2, so we could only do some minor things here. AsyncSubscriber would pave us a way to others. Once we are happy with implementation, we can add missing part of this messaging pattern and make it officially supported for Pub-Sub. Then, we can continue with others incrementally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We agreed to do AsyncSubscriber in userland.
elBoberido
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some small typos
elBoberido
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a little bit of nitpicking. Looks good. Waiting for @elfenpiff for his final review.
| * limit a service to only single messaging pattern as using event will cause no | ||
| way to use it again | ||
|
|
||
| Considering above, the continuation of the proposal is based on option 1. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering option 1, one could also extend the Service Discovery with the service attributes. Please have a loot at https://github.com/eclipse-iceoryx/iceoryx2/tree/main/examples/rust/service_attributes
| * If unsuitable for zero-trust or real-time use, how do we prevent accidental | ||
| misuse? | ||
|
|
||
| PR: To me idea, we will build async API on top of existing non-blocking, non |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpicking: Just to have this a little bit more formal :)
| PR: To me idea, we will build async API on top of existing non-blocking, non | |
| PR: The author proposes to build the `async` API on top of existing non-blocking, non |
| open a way for other to implement a bridge to other runtime like `tokio` as | ||
| basic idea will be shown | ||
|
|
||
| ### Milestone 2 – Implement PubSub |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ### Milestone 2 – Implement PubSub | |
| ### Milestone 3 – Implement PubSub |
Kk, will fix it once @elfenpiff review |
Notes for Reviewer
Pre-Review Checklist for the PR Author
Convert to draft)SPDX-License-Identifier: Apache-2.0 OR MITiox2-123-introduce-posix-ipc-example)[#123] Add posix ipc example)task-list-completed)Checklist for the PR Reviewer
Post-review Checklist for the PR Author
References
Closes #1002