Realtime Interface (Joint and Cartesian)#449
Realtime Interface (Joint and Cartesian)#449ted-miller wants to merge 71 commits intoYaskawa-Global:mainfrom
Conversation
There was a problem hiding this comment.
I keep on getting the message Micro-ROS PC Agent Disconnected ~5 seconds after Starting UserLan link state monitor (port: 1). But then it automatically reconnects, and then it disconnects again 5 seconds later. This is after a lot of messing around with combinations of /start_traj_mode, /start_rt_mode, and stop_traj_mode. I may have gotten into some sort of weird state. I've attached a large debug log here, it was happening near the end (ignore the reset alarm messages, those are unrelated). Since it happens almost exactly 5 seconds after the link state monitor begins, it seems like it's just failing to ping my PC. I don't know why, but I'll look into it more.
I'm also seeing a memory leak. Probably related, but maybe not.
|
I'm not sure what the best thing to do here is, but I have some thoughts on the reply/feedback message: Right now we are using the robot feedback position ( The problem with including the command position, though, is that if you submit an increment and then within the same interpolation cycle read the command position (which is what we are doing with the reply), it does not change. You don't know the updated command position until the next interpolation cycle. You could assume that the updated command position will be the reported command position + the increment that you just submitted, but of course the FSU can mess with your motion so that is not a safe assumption. The client application could compare the "expected" command position (command position at T1 + new increment submitted at T1) with the newly reported command position (command position at T2) to determine if the FSU has modified the motion and react accordingly. Or to make it more obvious, we could include another field that is either:
This last element is not strictly necessary since a client app could track that/figure it out. But I would want to make it so client app programmers can't easily overlook it |
I'm not explicitly allocating any new memory on the heap. And AFAICT, I'm de-init'ing everything properly. I have reason to suspect that my libmicroros is outdated. For your testing, did you get that library from me? Or did you build yourself? |
|
Are we OK with mandating little-endianness? |
it would be against best practices for network protocols (I'm old-fashioned when it comes to network protocols), but I guess with 'everything' being and/or defaulting to little-endian, it'd probably be OK. Would also make (de)serialisation easier, as clients could just use |
|
Note to self: So now the question is whether I should read feedback position before or after... |
Whoops I missed this. I got the libmicroros from you. That's probably the problem. |
|
@jimmy-mcelwain and @gavanderhoorn @EricMarcil |
|
I have been testing with Jazzy on YRC1000, using this client app. It just reads a USB joystick. |
|
Should more of the robot status information be included with the feedback position? My assumption is that a user will maintain a separate subscription to robot_Status |
|
What if the user ignores my sequence timing and sends multiple commands in a single cycle. The OS will buffer the data and feed it to me on the next cycle. Should I purge the buffer after each cycle before sending the reply packet? |
|
I think that it is a little bit strange that |
|
How is the client application supposed to know if the RT session has ended? It looks like the controller exits the increment loop and doesn't send anything to the client no matter what the problem is. A client-side timeout in case of e-stop, timeout, missed packets, etc and tracking |
I think that if we had an end-session packet, that we would want to include some reason code. But IMO, that information could just as easily be obtained from I have added some text to the docs/comments that the client should use the same timeout as the server. |
|
Right now, the controller doesn't "do" anything if it misses some packets, but not enough to drop the connection. Like if 8 packets are dropped but it takes 10 to disconnect, but it doesn't notify the client or log anything. Now the client may notice that the controller is not responding or once it does respond it may notice the discrepancy in expected command vs actual command position. So I think that is probably acceptable, but just wanted to mention that we don't "do" anything like we do if the FSU is modifying the command position. |
|
After a near-crash while testing this, I think that it would be good to allow the user to configure optional speed limits in the config file. I confirmed that the problem that resulted in the near-crash wasn't code related on either the controller or client side, it was just the joystick that I was using failed. For now I'm going to lower the speed limit on my client app. I'm also okay with not adding any speed limit config option since:
Just a thought As far as reliability, it seems quite good. It very rarely (something like 1/20000) I think "misses" an interpolation cycle that it may perform one cycle late. That could possibly be improved, but it is infrequent and small enough that I don't think it matters too much, especially considering users are intended to monitor feedback. Also, do we want to "force" users to read the feedback messages? Technically they could just have a 4ms (or whatever the interpolation period is) timer on their client PC and fire packets without reading feedback. That's obviously a bad idea, but do we want to allow it? |
Since this can be configured on the client app and also through the FSU, I don't think it's necessary.
I think that's due to ubuntu and not the robot controller.
The latest commit ensures that the two systems stay in sync. If the client attempts to send anything before the robot's "trigger", then the data gets discarded. |
There was a problem hiding this comment.
A couple more notes:
There is a bug where the rotational portions of the cartesian feedback messages are being truncated to the nearest degree. This isn't very obvious because it is converted to radians before being sent to the client, but if you convert it to degrees it is easy to see. So the cartesian feedback messages can be off by up to 1 degree.
If I end a session with an e-stop and then I start a new session, then I get the "you are being slowed down" message as a response to the first packet. It seems like it "remembers" from the previous session that it failed to complete a movement (because of the e-stop). I think something isn't being cleared properly between sessions.
I am able to control the robot for several seconds past when the agent disconnects. I can kill the micro-ros agent on my PC and still control via a joystick until a few seconds later the cleanup happens. This is because we chose to use our own UDP message rather than ROS2. This means you can't call /stop_traj_mode or monitor the /robot_status topic during these few seconds. I don't know if we want to do something to avoid this behavior.
Unlike other services, you must do /stop_traj_mode and then /start_rt_mode to restart again. I don't have a problem with this, but with say trajectory mode, I can just call /start_traj_mode after an estop and it works. But with the RT interfaces, if I e-stop, then I I need to call /stop_traj_mode and then /start_traj_mode to start it up again.
| Once activated, a UDP server will listen on port `8889` (default). | ||
| The user then sends the first increment with a `sequenceId = 0`. | ||
| After that, the user must wait until the robot replies before sending the next increment. | ||
| Each subsequent command must increment the `sequenceId`. Additionally, each subsequent command must not be sent until the robot replies to the previous command. |
There was a problem hiding this comment.
Saying that "each subsequent command must not be sent until the robot replies to the previous command" seems incompatible with the max_sequence_diff_for_rt_msg.
I think that we should consider removing max_sequence_diff_for_rt_msg, and just stopping/alarming if a packet arrives out of sequence. That would enforce the idea that each command cannot be sent until the robot replies to the previous one.
There was a problem hiding this comment.
I disagree. But I was struggling to articulate why. So here's Gemini :)
The max_sequence_diff_for_rt_msg parameter in motoros2_config.yaml is a real-world safeguard. The system uses UDP, which is a "fire-and-forget" protocol that does not guarantee packet delivery. This parameter acknowledges that packets can be lost. It provides a tolerance window, allowing the connection to survive minor network issues instead of dropping on the first lost packet. If the robot receives command N and the next one it sees is N+3, it assumes packets N+1 and N+2 were lost and continues, as long as the jump (3) is less than or equal to max_sequence_diff_for_rt_msg (default 10).
This suggestion would make the real-time motion mode extremely brittle and sensitive to network conditions. A single lost UDP packet between the client and the robot would cause the motion to stop and the session to be dropped. The user would then have to call /stop_traj_mode and /start_rt_mode to recover. This would be highly disruptive in any production environment that doesn't have a perfect, real-time network.
In conclusion, the current implementation is a practical compromise. The documentation describes how the client should behave, while the configuration parameter makes the server resilient enough to handle the inevitable reality of packet loss over UDP
There was a problem hiding this comment.
I believe I would at least partly agree with @jimmy-mcelwain -- and I disagree with Gemini.
Real-time command interfaces are supposed to be deterministic; allowing out-of-order packets, or dropped packets breaks that determinism, or at least weakens it.
re: "sensitive to network conditions": we're essentially using UDP here as a stand-in for a deterministic network protocol, but without changing that transport layer or any of the network layers (fi ethernet), which is non-deterministic itself by default (collision handling fi). To make it work well enough, we can reduce the chances of non-deterministic behaviour occurring by essentially making a two-host network segment where both take turns to use the medium. That means: direct cable, no switches, and only RT traffic on the connection.
Allowing for lost datagrams improves (developer) UX, but if too many datagrams go missing, control performance suffers.
I would argue users interested in using a real-time external motion interface care about control performance and are likely willing go through the hassle of setting up sufficiently performant network infrastructure.
The documentation describes how the client should behave, while the configuration parameter makes the server resilient enough to handle the inevitable reality of packet loss over UDP
it's never made explicit, but I believe this assumes a shared connection. Otherwise there is very little chance of packet loss -- unless there is some network failure besides the RT code.
There was a problem hiding this comment.
To make it work well enough, we can reduce the chances of non-deterministic behaviour occurring by essentially making a two-host network segment where both take turns to use the medium. That means: direct cable, no switches, and only RT traffic on the connection.
I agree with you in principle. But I don't think this is realistic in a true industrial environment. I have seen a lot of "inadvisable" (stupid) configurations.
I'm fine with lowering max_sequence_diff_for_rt_msg. But I don't think it can be realistically eliminated.
43b08cc to
fefa69d
Compare
|
Functionally, I think this is ready. But during testing, I'm seeing a memory leak. @jimmy-mcelwain says that he got rid of the memory leak by using a new build of libmicroros. I added to https://github.com/ted-miller/motoros2_interfaces/commits/experiment_rt/. So, I need to rebuild libmicroros. When I build locally, I'm seeing a consistent 128 byte leak after each broken connection. When I build using the online portal, I'm seeing an inconsistent leak after each connection. Additionally, it is missing the implementation of So, I'm really confused. |
|
We added the That being said, I am not experiencing any memory leak, even with the most up-to-date version of this and the |
|
Close was accidental. |
|
So, the memory leak is not related to this PR. It is something to do with a multi-robot configuration. This PR should be good to merge. |
gavanderhoorn
left a comment
There was a problem hiding this comment.
I have some comments/questions, but can only post those later, so marking as request changes. Apologies for not posting them sooner @ted-miller.
|
ping @gavanderhoorn ? |
|
@gavanderhoorn |
gavanderhoorn
left a comment
There was a problem hiding this comment.
I've commented mostly on the documentation and configuration file.
The implementation might need changes depending on discussion, so I didn't want to post comments on that yet.
config/motoros2_config.yaml
Outdated
| # Timeout for real time motion commands. If a command packet is not received | ||
| # within this number of milliseconds, the motion mode will be cancelled. |
There was a problem hiding this comment.
Does this affect only the 'first' packet, or all traffic?
There was a problem hiding this comment.
This is for all traffic. This serves as a kind of "keep alive" for the session.
config/motoros2_config.yaml
Outdated
| # DEFAULT: 30000 (30.000 seconds) | ||
| #timeout_for_rt_msg: 30000 |
There was a problem hiding this comment.
30 sec feels too long a time for a real-time interface.
5 seconds would perhaps already be more than enough. That's 1250 periods at 250 Hz, which is what we expect the motion task to be running at.
There was a problem hiding this comment.
I'm fine with that. I just picked a number.
config/motoros2_config.yaml
Outdated
| # DEFAULT: 10 | ||
| #max_sequence_diff_for_rt_msg: 10 |
There was a problem hiding this comment.
Do we want to support open-loop use (ie: just filling the buffer from an off-line generated trajectory) and is this to support that use-case?
Otherwise 10 messages is quite a few, and a connection that manages to lose 10 datagrams should probably not be used in the first place for a real-time control application. I would suggest lowering this to 3 or something similar.
There was a problem hiding this comment.
10 is less than half a second. But again, I just picked a number. So, I'm ok with lowering it.
|
|
||
| ### start_rt_mode | ||
|
|
||
| Type: [motoros2_interfaces/srv/StartRtMode](https://github.com/yaskawa-global/motoros2_interfaces/srv/StartRtMode.srv) |
There was a problem hiding this comment.
If/when we merge the related PR, this should be replaced by a permalink.
|
|
||
| <img src="img/RtFlow.png" alt="Command Flow" /> | ||
|
|
||
| ### Data format (command) |
There was a problem hiding this comment.
Commenting here, as it's about both the command as well as the reply.
This is not an exhaustive list, but I would recommend to add:
- a proper header structure, containing:
-
a
versionfield: clients should implement version checks and refuse to work with servers of incompatible versions (and vice-versa). This should make it at least a little 'safer' to introduce breaking changes on the server side -
a
packet_typefield: it's likely this interface will be extended in the future, which would then require some way of identifying which packets are being exchanged. Even if it absolutely won't be extended, this field could function as part of the 'magic' of the packet, making it harder to interpret 'random' data as valid motion commands / status data and to sync on incoming streams -
a
timestampfield: we could perhaps also rely on the determinism of the controller and assert thatsequenceId * period == timestamp, but that would:- require more math on the client side, and
- makes it harder to deal with overflow (which the server implementation already does, but still requires work on the client), and
- makes the client responsible for matching the (calculated) stamp with its local clock (fi when correlating sensor data to robot motion). We already have a stamp on the MotoROS2 side, which when using NTP / micro-ROS synchronisation facilitates correlating MotoROS2 produced data with other sources. This would serve a similar goal
we can still use the stamp as a sequence nr of course (in the end it's just a 'unique' numerical ID anyway), so clients should return the 'stamp' sent to them in the last state packet as part of their command packet.
-
Optional, but recommended:
- add some reserved fields to allow for changes to packet definitions without breaking (de)serialisation of old traffic and/or breaking older implementations
For the state packet specifically:
- a way to encode the full robot status (perhaps as a mirror of
RobotStatus), perhaps as a bitmask. This would also include the FSU state (or perhaps FSU state deserves its own field, also a bitmask). The goal is for this protocol to be usable without ROS as well, so it can't delegate broadcasting robot/controller state toRobotStatusmessages, as those might not be available in all usage scenarios (and even if they are available, they would not be (easily) synced with RT state traffic, nor would consuming them on the client be necessarily RT safe) - a field encoding the currently active command mode (see also below)
For the command packet:
- a field indicating which control mode the client currently expects the server to have active: this serves both as a sanity check (client sending joint data while server expects Cartesian offsets), as well as an explicit indication as to in which way
deltavalues are supposed to be interpreted (instead of implicitly relying on a piece of internal state being correctly maintained by MotoROS2).
For both (motion related packets) again:
- it would be good to add a field stating whether absolute or relative mode is being used: right now this interface only supports offsets (ie: relative commands, or velocity), but it would not be too difficult to have the server also support an absolute command mode (ie: position). That would require changes to how offsets are interpreted & processed. By having clients specify which mode they are using (and at this point only relative mode would be supported, so anything else would result in errors), we prepare for when this changes and could do so without breaking anyone's client implementation.
My main motivation for these suggestions would be to make it possible to interpret data in command and state packets completely independently of the state of MotoROS2, another server and/or a client implementation. That makes both implementing MotoROS2, that other (hypothetical) server and clients easier to implement, as well as things like protocol dissectors and (de)serialisation libraries.
There was a problem hiding this comment.
@gavanderhoorn What do you think about this?
struct RtPacket
{
UINT16 version;
UINT16 packet_type; //see enum xxxx
UINT32 sequenceId;
double delta[MAX_GROUPS][MAX_AXES]; //[8][8]
int toolIndex[MAX_GROUPS]; //[8]
char reserved[32]; //for future expansion
}
struct RtReply
{
UINT32 sequenceEcho;
double feedbackPositionJoints[MAX_GROUPS][MAX_JOINTS]; //[8][8]
double feedbackPositionCartesian[MAX_GROUPS][MAX_JOINTS]; //[8][8]
double previousCommandPositionJoints[MAX_GROUPS][MAX_AXES]; //[8][8]
double previousCommandPositionCartesian[MAX_GROUPS][MAX_AXES]; //[8][8]
RobotStatus status; //literal clone of the /robot_status structure
bool fsuInterferenceDetected;
}
a timestamp field
Personally, I don't see why that's necessary. Since this is meant to be used in real time, why do I care about the stamp? All I really care about is the sequence identifier. I'm expecting that the user has done any time parameterization.
For the state packet specifically:
a way to encode the full robot status (perhaps as a mirror of RobotStatus), perhaps as a bitmask. This would also include the FSU state (or perhaps FSU state deserves its own field, also a bitmask)
I've already got the fsuInterferenceDetected field. I'm not sure what additional information we can include. I guess it's possible to indicate how much interference was detected...
For the command packet:
a field indicating which control mode the client currently expects the server to have active...
...
For both:
it would be good to add a field stating whether absolute or relative mode is being used
These could be part of the packet_type.
| //########################################################################## | ||
| // !All data is little-endian! | ||
| //########################################################################## | ||
| struct RtPacket |
There was a problem hiding this comment.
should this perhaps also include a bitmask/integer array indicating to which groups increments are being commanded?
That would make it an even thinner wrapper around mpExRcsIncrementMove(..) while making it possible to not have to send 0 increments for groups that should not move.
There was a problem hiding this comment.
I don't want to give the impression that a user can send one packet to control R1 and another packet to control R2. The current implementation makes it very clear that the whole system must be controlled by a single command packet.
| } | ||
| ``` | ||
|
|
||
| Please note that rotations are applied in the order of `Z Y X`. |
There was a problem hiding this comment.
Just thinking out loud: what about supporting quaternion input? Would remove the burden of figuring out the correct rotation order from clients.
There was a problem hiding this comment.
My brain doesn't like quaternions. My first instinct is, "no".
But... quats would be consistent with our /tf publisher. So, "maybe"
There was a problem hiding this comment.
Ehhh... I really prefer yaw, pitch, roll. Since the idea is to be a minimal wrapper around the motion API, I think the data should be as close as possible to the 'final form'.
@jimmy-mcelwain Do you have an opinion?
There was a problem hiding this comment.
I just remembered that jimmy is out all week.
There was a problem hiding this comment.
Since the idea is to be a minimal wrapper around the motion API, I think the data should be as close as possible to the 'final form'.
I think that I would agree with this, but does the statement:
Please note that rotations are applied in the order of
Z Y X.
Make sense in this context?
If we are talking about pose, yes it is important that we are saying that the rotations are applied in the order of Z Y X. But as a truly "instantaneous" angular velocity, I do not think that it makes sense. Now I don't know how mpExRcsIncrementMove works in the back end. I don't know if it really is calculating a rotation about Z then Y then X for a given increment and then moving there, or if it is figuring out a path that gives it the angular velocity of the Rx Ry Rx divided by the time increment.
There was a problem hiding this comment.
So... it turns out that position is reported by the robot using Z Y X, as expected. But, when commanding incremental motion, the rotations are applied in order X Y Z.
So a quaternion may make this easier on the user. What do you guys think? We can hide the rotational order in the backend.
Additionally, I found that rotations get complicated when commanding such small rapid increments.
increment (0,0,0, 90, 90, 0) is not the same as doing this:
for 90 times
increment (0,0,0, 1, 1, 0)
So I think that becomes hard for a user to calculate rotations.
|
@ted-miller: apologies for my (very) delayed response. I wanted to take the time to write out some of my thoughts with sufficient detail, but couldn't find that until now. |
|
@gavanderhoorn Please see the latest commits. Let me know if that meshes with your ideas. NOTE: I haven't actually tested these changes yet. So they still need to validated on the rc. |
|
@gavanderhoorn
Honestly, I can't remember if I've tested since Dec. So I'll do some physical testing today. |
This is just a draft. Both @gavanderhoorn and @jimmy-mcelwain have made some feedback already. I'd like to have a centralized place for that feedback.This PR adds a real-time control interface, intended to be used in a closed loop system. The idea is to minimize as much communication latency as possible. This will take the user data and pump it directly into the motion API with only the bare minimal processing.
There are currently two motion modes, which require a planned trajectory. This adds a new motion mode, which is activated by the
StartRtModeservice.(Note: I will be getting rid of this argument in the service.)
Once that is active, the client must open a UDP connection to port 8889 (*configurable). Once that is open, the client must send the first command packet, with an ID of
0. Then it should wait for a reply from the robot, which contains the current feedback position of the robot. (To be expanded with additional status info...) Upon receiving that feedback packet, the client must immediately send the next command packet.Please note that each command packet is an incremental motion relative to the current position. The return value of the
StartRtModeservice call will indicate the time period for each increment. (Default4 msfor a single robot arm.)For this interface, the
StartRtModeservice invocation will indicate whether the increments will be in joint space (radians) or cartesian (currently mm and deg; this will probably change).There is a test app which uses a USB joystick to move the arm in real time. This one is joint space. This is cartesian.
All testing has been done with Jazzy and YRC1000.