Replies: 3 comments 6 replies
-
Two important points:
Shared data structuresThe following data structures are "shared", i.e. accessed from runtime-context, channel-context and async-context (physical action etc).
What about the network channels?
|
Beta Was this translation helpful? Give feedback.
-
More thoughts here what concurrency primitives must be offered by the platform:
|
Beta Was this translation helpful? Give feedback.
-
Our last meeting we had a discussion on how to enable the use of both interrupts and mutexes. The problem is that we want to use OS mutexes from Zephyr, RIOT and Posix when we have those available because we can then reduce the blocking between threads. If interrupts are disabled then no other thread can progress. If you just lock a mutex then other threads can execute, unless they try to acquire that lock. However, such locks can not be acquired by interrupts. We discussed adding a parameter Unfortunately, this would affect most of the Mutexes we created in: I am going to think a little more on this problem. Perhaps we need a user-configured compile def that tells us if he will schedule any actions with interrupts. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
We lack a defined and thought-through "concurrency model" for our runtime. By this, I mean clear guidelines for how different execution contexts interact in our runtime, where there should be critical sections, and whether certain parts of the code only can be called from a certain context or whether it is always called inside or outside a critical section.
This problem surfaced because currently, too much of the runtime execution is done inside a critical section, and with
logging: Debug
, there is a lot of printing also (which is very slow on MCUs). This is only a problem on bare-metal because on bare-metal critical sections disable interrupts and prohibit any async context from executing. While on OS platforms, the critical section is a mutex, which allows async contexts to run, as long as they don't try to enter a critical section.I don't have a clear picture of this yet, so this document will serve as a tool for exploring and understanding the problem.
The asymmetry of critical sections
On bare-metal platforms critical sections are implemented by disabling all interrupts and thus blocking any async context (such as an ISR receiving bytes from UART) from executing. This suggests that critical sections should be used sparingly and only for short periods.
On OS platforms critical sections are less "expensive" as we use a mutex and don't necessarily block other threads.
However, the runtime itself is platform-agnostic and should not be concerned with whether it is running bare-metal on Pico or on a OS. Also it should not need to know whether a network channel is interrupt-driven or has a thread running in the background.
Because of this we need to think about critical section as if we are on bare-metal. I.e. we must assume that whatever the NetworkChannels are doing in the background is halted whenever we are in a critical section. Meaning that we must keep them sparse and as short as possible.
Points of interaction between execution contexts
There are four main points of interaction between execution contexts
Scheduling of physical actions.
This is the obvious one. The user can create an async context (thread/ISR) that schedules a physical action. This reads out the current time and modifies the event queue. Both of which need a critical section. The former because we are doing clock sync.
Polling the connection status of a NetworkChannel
This is done from the runtime-side. And it might inspect the state of the NetworkChannel and it might also send messages or do other things to advance the connecting of the NetworkChannel. This might need a critical section wrt other threads/ISRs also modifying the NetworkChannel.
3.Sending data over the NetworkChannel
This also done from the runtime-side through
chan->send_blocking()
. This should be independent of the receive path, however, certain implementations (such as UartChannel) will also callchan->send_blocking()
asynchronously to reply to custom UART connection messages. This implies that we need a critical section aroundsend_blocking
. Unfortunately, in the CoapChannel we depend onsend_blocking
being called OUTSIDE a critical section. Because it needs interrupts enabled to get the ACK.Async context receiving data etc within the NetworkChannel
Most likely, each NetworkChannel has some other execution context modifying it, either a thread or an ISR. Typically it is copying received data into some buffer, and possibly tries decoding this data. In some cases (e.g. UartChannel), it also can call
send_blocking
to implement its own protocol to ensure that we have a connection status. Finally, the async context will interact with the runtime in some way.Received message callback
When a NetworkChannel has received an entire message it will call the
msg_received_cb
which takes it into the runtime. This is a point of interaction where the async context is executing "within" the runtime with access to its data structures. As we have added the System Event Queue, this callback should only schedule an event or a system event and return. This must naturally be within a critical section such that the runtime is not modifying the event queue or system event queue at the same time. It will also update some fields of the input port structs (such as last_known_tag etc).I think (1) and (5) are clear. It is a well-defined interaction and it is clear that the async context must enter a critical section before modifying the data structures. Also it is (mostly) clear where the runtime context must have critical sections. (When interacting with the event queue and the system event queue. And when looking at the last_known_tags of federated input ports.
It appears to me that (3) the runtime calling
send_blocking
is perhaps the most tricky problem. Because some channels (CoAP) require this to be done from outside a critical section, while at the same time, other channels (UART) want to do this also from the async context behind the scenes. This function is not reentrant, and it might not be possible to require that it is (because it copies data into a buffer and possible interacts with HW peripheral)This thread will be continued next week
Beta Was this translation helpful? Give feedback.
All reactions