Refactor: Redesign API for Greater Control As Needed by SOTI & Testbed#54
Refactor: Redesign API for Greater Control As Needed by SOTI & Testbed#54
Conversation
* Cleaned up and renamed some things. * Filter messages by recipient ID by default, disabling with CWM_DISABLE_FILTERING. * Replaced "RTOS Mode" with CWM_DEFINE_RTOS_TASK symbol.
|
Made some edits. However, there's still some compiler errors from the calls to |
|
Thanks for the changes! If you do add the CAN handle as an argument for |
|
Hey, sorry for disappearing. I've actually been contemplating this line specifically. As you can see, it transmits ACKs for each message before processing it. Which means that if any foreseeable command we implement takes longer than the 3.6 millisecond timeout, we could be at risk of failing to acknowledge messages in time. Now, 3.6 milliseconds is a looong time for an 80 MHz CPU, but delays could happen. Take for example:
In fact, this one ISR in Payload might just be the evidence I needed to feel worried about not using an RTOS. It polls 32 individual ADCs in succession while sending 32 CAN messages. Just one CAN message would theoretically take 0.224 milliseconds with our current baud rate. And 0.224 x 32 = 7.168 ms, well over the timeout. And this is hooked up to a timer so it can happen at any time. The funny thing is I actually wrote this code myself, and I'm only now realizing how bad of an idea that is. But even if we took it out of the ISR and implemented our own pseudo-scheduling, it still needs to get executed and we'd have to probably sprinkle in calls for sending ACKs all over the code in an attempt to ensure delays don't get in the way. But at that point, we're basically re-inventing RTOS, and there's still no guarantee that we caught everything. Also, this ISR is for telemetry reports, which is something that every subsystem will have to implement. So, the more I think about this the more convinced I am that all subsystems should probably use an RTOS, and we should probably require an RTOS for managing ACKs. It's the only way to ensure with confidence that ACKs are returned in a timely manner. |
|
@Koloss0 That's a good point, thanks for laying it out. Mind explaining why ACKs can't be transmitted in the Rx ISR? Is it just to minimize the time spent in it? Either way, I see what you mean and it does seem like scheduling/RTOS has a lot of benefits. |
|
It's to minimize the time spent in it, yes. All the advice out there seems to suggest keeping ISRs as short as possible is good practice, so as to not interfere with other polling/IO operations. Also, I don't know what the likelihood of this happening is, but technically the ISR could fire again while it's in the middle of transmitting an ACK, which would cause the second ACK to not get transmitted. So, we're keeping it as short as possible by just loading messages into a queue for later processing. |
Switched implementation to be RTOS-only based on the rationale from this comment: #54 (comment)
|
@ArnavG-it Okay, I made a new commit that addresses the points we raised. For simplicity, TUK will now only support RTOS. I think it's the right way to go. I tried to just get my ideas down for how the logic would work and didn't bother getting it to compile. Could you take over from here for me and get it working? It would be much appreciated. |
|
Awesome, I'll definitely take a look tomorrow. |
Also added some task attributes and an osMessageQueueNew wrapper.
|
Amazing, thank-you! For the creation of tasks and queues, initially I was going to let the user create them with the settings they want and pass it into the module. CDH uses this screen for managing all their tasks and queues, so they could use it like normal and pass them to the module: However, when I think about it, it's possible we may want to add even more queues to TUK (e.g. take the error queue, or maybe txCache for example). And if that's the case, that might be introducing a lot of queues that the user isn't really in a knowledgeable position to set good values. And if we ever want to change it later it could be slow and inefficient to manually go through each subsystem and change it ourselves. So maybe we should hide the creation of those things in the So, I would say:
|
|
That makes sense. I was thinking about some similar stuff, hence why I didn't use the |
This means CANWrapper_Init will have to be called after osKernelInitialize in the subsystem code.
|
@ArnavG-it Hey again, wanted to update you. I've been a bit busy but I'm now taking this up again. I'm trying to find a solution that isn't too complex but gives us the control we need for doing all these low-level functions in the IO board and SOTI. It's taken a while but I think I'm getting somewhere. Also, fun fact:
This is incorrect. I found this out by looking into the Cortex-M4, which is the processor used in STM32's. If an interrupt occurs with the same priority as the one already being handled, it actually defers until the request completes, known as tail-chaining. But a higher priority interrupt can pre-empt the current one. Very interesting. |
|
Thanks for the update. I've been a bit busy too but I'm ready to work on this once you think of the next steps. |
Adds a separate API for advanced use cases. This maintains the simplicity of the old API while enabling greater control with the advanced API. Other changes: * Added significantly more documentation in the code. * Added rx_callback & tx_callback in the advanced API. * Renamed a few things. * Added an 'Error Handler' thread. * Replaced ErrorQueue ADT with an RTOS queue. * Added CANWrapper_Transmit_Raw() in the advanced API. * Added `body_size` field to CANMessage struct: This is a minor change which doesn't effect much. The benefits are that it feels just a bit more elegant in the code (entirely subjective), and also gives advanced users the possibility of transmitting illegally sized message bodies for testing purposes (if that is ever desired). It could also be convenient to normal users if they want to copy the body contents to a buffer. **DOES NOT COMPILE:** The commit was getting big the more I changed things, so I pre-emptively created this commit. However, I still want to make a few changes. Those changes will be (potentially): * Removing hcan from CANWrapper_InitTypeDef and using CAN_Start() in both APIs. This will slightly reduce the need for some #ifdefs. * Making it compile. * Looking into `HAL_CAN_TxMailbox0CompleteCallback`: This is an ISR which would allow us to execute code whenever a message completes transmission. This would be ideal for tx_callback, because currently it only gets called when a TX request is made, not when the message is sent.
|
@ArnavG-it Alright, I pushed the current state of what I was working on. I have to stop for a few hours, but I think it's coming along very well. I wrote down some things I was going to work on next. Feel free to pick up where I left off with the first two bullets if you have time. It should be fairly straightforward and would introduce you to the changes. The third bullet should probably be a separate issue. And all good if you can't! Also footnote: For the first bullet, I didn't explicitly mention it but along with removing hcan from the init struct, I was also going to add hcan back to the callbacks and transmit function, thereby basically making support for multiple CAN peripherals part of the standard API. It probably won't get used by anyone except us, but it would reduce the #ifdefs a bit. |
|
This looks like it will block permanently if void Error_Handler_Thread(void *argument)
{
CANWrapper_ErrorInfo item;
// Infinite loop
while (1)
{
Poll_Timeouts();
// Wait for the next item in the queue.
if (osMessageQueueGet(s_error_queue, &item, NULL, osWaitForever) == osOK)
{
// Let the user handle it.
s_init_struct.error_callback(&item);
}
}
}We could essentially set a polling rate by doing this: osMessageQueueGet(s_error_queue, &item, NULL, 10) // Timeout in msThat's pretty simple and probably fine, but there might be a fun solution. Rather than using the hardware timer, we could use the RTOS kernel ticks for the message timestamps. Then the error handler task could use |
|
Ah, good catch! It would be worth looking into how accurate the ticks would likely be. The decision for using a timer I believe came from:
But if the accuracy is sufficient, then that would be a much simpler and more robust approach. |
Also removes the error queue. Still have to remove the htim peripheral references. Article about tick resolution: https://www.freertos.org/Documentation/02-Kernel/05-RTOS-implementation-tutorial/02-Building-blocks/11-Tick-Resolution
|
The Tx Cache is a bit problematic with RTOS. The if (rx_behaviour | RX_CLEAR_TX_STORE && item.msg.is_ack)
{
// Clear the corresponding message in the TxCache if it exists.
int index = TxCache_Find(&s_tx_cache, msg);
if (index > 0)
{
TxCache_Erase(&s_tx_cache, index);
}
}And if (timestamp == s_tx_cache.items[0].timestamp)
{
CANWrapper_ErrorInfo error;
error.error = CAN_WRAPPER_ERROR_TIMEOUT;
error.msg = s_tx_cache.items[0].msg;
TxCache_Erase(&s_tx_cache, 0);
s_init_struct.error_callback(&error);
}This is true whether we use kernel ticks or the previous I'll look into semaphores, although letting the error handler run while the ISR is blocked isn't possible, so I'm not sure if there's a solution there. Maybe there's some way to make erasing and reading messages atomic. Maybe the removal of messages could be deferred from the ISR to a task, in which case semaphores would solve it. This probably isn't a priority but I figured I should take note of it. |
And make it compile. There is still one error in advanced mode as CANWrapper_Transmit references s_init_struct.node_id which only exists in normal mode.
|
@Koloss0 Let me know how you want to deal with |
Thanks for your help as well!
I was thinking
No, absolutely! This was at the back of my mind as well as I was working on it. Race conditions are incredibly common in threaded environments so we need to stay vigilant. Thanks for already doing research on it! |
|
@Koloss0 Here are some notes about the accuracy of ticks if you're curious. CAN messages are considered timed out after 3.6 milliseconds. The job of
Assuming case 1 should be avoided at all costs, the error handler should delay an extra tick. Then there's a tradeoff between minimizing the extra delay time in case 2 and the overhead from a higher tick frequency. Say a timer is started at tick 0. Here are two idealized examples:
Hopefully that's all correct and makes sense. Essentially, timeouts will always be reported late in order to not be reported early, and we can just control how late that might be. I haven't looked into the exact overhead yet, but let me know what you think about that extra delay time. We could always go back to the timer if it's a problem. |
|
@ArnavG-it Yeah, the extra delay shouldn't be a problem. None of these values are set in stone, it's all stuff that can be re-evaluated at a later date if we need to. The only issue I'm spotting though, is that it's possible to have two messages with the same timestamp. That could lead to an unexpected edge case in the error handler thread. To fix it, there should be a loop to remove and handle items with matching timestamps until none are left. |
|
Actually, it appears it would have been fine. I thought |
Yeah, a benefit is we don't have to handle overflows. Thanks for double checking though. I do have to fix the timeout calculation in that thread, by the way, so ignore the fact that it makes no sense currently. |
Wouldn't it work? The data type is a |
I was just accidentally adding the timeout in microseconds to a tick value. The above commit should have fixed that if I did it correctly. |
Changes: * Deleted CAN queue and error queue code. * Now defines timeouts in terms of milliseconds for simplicity.
Slightly easier to comprehend.
You used the term "standard" and I thought that was a better name.
|
I think this is good enough for this PR. Regarding the TODO's,
|

Multiple API changes to support the test interfaces and improve timing:
Replaced Polling with RTOS
Support for multiple CAN lines
CANWrapper_CAN_Startto initialize their CAN peripheral, and pass thehcanpointer to the CAN Wrapper's functions.Replaced Modes with Standard and Advanced APIs
Replaced Hardware Timer with RTOS Ticks*
Other notes:
STM32L4. The IO board must defineSTM32F7.body_sizefield to theCANMessagetype.TODO:
The changes are being tested on this testbed branch.
Closes #24, #52.
#53 should probably be closed as "not planned".