Replies: 6 comments 15 replies
-
The drifting is typically caused by loss of position information. This could be communication issues or tracking issues, but in almost all cases it is tracking-related. Our tracking code at https://github.com/USC-ACTLab/libobjecttracker/blob/master/src/object_tracker.cpp#L449-L528 removes CF that have not been updated in 0.5s (your point 3 - how did you get 2seconds? Did you adjust the value in the code?). A dramatic crash like that is usually caused by completely wrong state or setpoint information (the controller has very high gains and tries to overcompensate or the EKF diverges). Getting a crash like that after drifting should be nearly impossible. One reason I can see is that you somehow hit https://github.com/USC-ACTLab/libobjecttracker/blob/master/src/object_tracker.cpp#L437-L438, and re-initialize the tracking mid-air. This sounds very dangerous and I am not sure why we have this functionality in the first place. Perhaps @jpreiss remembers? If you can reproduce the issue, I suggest removing the If you have the uSD card deck or the console output of the flight, that could also give an indication on what happened. The spurious markers shouldn't be a big problem, if the ballon is far away. Since you don't see issues before the "drifting" stage, I don't think they are causing the problems. |
Beta Was this translation helpful? Give feedback.
-
Situation added.@whoenig My colleague told me that he had encountered a similar situation, which means we experienced a total of two such system crashes. He saw some other details, which I will describe as a supplement. Problem analysis.I think the two phenomena should be the same. From the video, basically it is because cfs did not receive the positioning data for about two seconds, thus creating a drift in the air, and then somehow the cfs received the positioning data again, and the target control position was too far away from the estimated position of the current state, thus causing cfs to roll over. There are two questions that need to be addressed here.(1) Why did the cfs not receive the positioning data for a long time?I think, it has nothing to do with PA, because the cmdposition command is still being sent all the time, otherwise cfs will also crash because it can't receive the streaming command, which should be a problem related to Crazyswarm. (2) Why did the location data suddenly come back later?a. If it is a network fluctuation between optitrack and crazyswarm, maybe the network transmission is back to smooth. There are a few questions about the libobjecttracker program.
|
Beta Was this translation helpful? Give feedback.
-
In fact, I only use the cmdposition directive in all my programs. I didn't use any advanced instructions because I knew there was a problem switching between high-level instructions and low-level instructions.Including takeoff and landing orders, I used cmdposition to achieve.
Is the connection between crazyswarm and motive via TCP? If you connect to UDP, you may also have data sticking.
I agree that you delete the part of the reinitialization code. There's a lot of risk.
I want to be able to locate the problem because I want to make the whole system more stable. This problem is a small probability of failure, I can actually install an sd card to log, but maybe I flew dozens of times and couldn't replicate the problem. However, I think I can test the positioning data layer of the system. I'm sure I'm just using cmdpostion for position control in my control program. I want to record the motive output data or objecttracker output data for 12 hours to see if there's an exception. I wonder if that's feasible.
What else could cause the system to block? I think I'll do some tests.
…---Original---
From: ***@***.***>
Date: Wed, Jun 30, 2021 02:29 AM
To: ***@***.***>;
Cc: ***@***.******@***.***>;
Subject: Re: [USC-ACTLab/crazyswarm] cfs drift and then all crash (#406)
It is very unlikely that this is caused by a connection lag with Motive. However, a similar outage of the data can occur if the crazyflie_server is blocked by a function. In the current version, any non-broadcast communication with a crazyflie could cause that (for example, sending the goTo command to a individual Crazyflie). Do you have such calls in your code, or do you only use the cmdposition topic?
I think this re-initalize code in libobjecttracker should be removed. It was probably well-intended for cases where one wants to keep the crazyflie_server running even though one drone crashed (and lost tracking), but the risk of having it seems pretty high IMO.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Beta Was this translation helpful? Give feedback.
-
@whoenig @jpreiss I just checked the ROS log file on my computer and compared the difference between the log of the incident and other logs. I found that there are two differences. 1. Two consecutive error messages:
|
Beta Was this translation helpful? Give feedback.
-
I am not familiar with the ros mechanism. But read the context information of the error. I looked at this as if it was not a communication problem between motive and crazyswarm, but an internal communication problem with ros?
|
Beta Was this translation helpful? Give feedback.
-
Hello, this is the ROS log from another one of my system crashes. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
@whoenig @jpreiss Recently I did a test of a centralised drone formation using crazyswarm. 20 cfs were arranged in a square and flown around the field.
The positioning method we used: optitrack 40HZ output data, single point of tracking and positioning
The control command we used: cmdpostion command for position control throughout
However, during one of the tests there was a situation where the cfs drifted en masse and then crashed. I can confirm that this is a very unlikely event, as we have tested this dozens of times and it has only happened once. Can you help analyse the possible causes?
Here is the video link:https://youtu.be/uofjD1xKTuI
Let me clarify other things.
1, the scene in the video is just a camera field of view issue, in fact the cfs do not fly out of the scene.
2, there is a balloon on the left side of the scene, which is a prop in my other test subjects, but in a motion capture environment it causes about 15 false noise points
3, the program I run in crazyswarm has cf marker point loss judgement, when the master program receives less than 10 times location data in 2 seconds (same as rviz location data), then the cf is removed from the aircraft management list and no more location data is sent.
4、In the video, the performance of the cfs can be divided into two phenomena: one is drifting in the air, which should be the reason for not receiving the positioning data; the second is crashing and tumbling down, I guess it may be that the positioning data and position control instructions were received later, but due to drifting a large distance, the cmdpostion position control instruction step was too large and all the cfs crashed down.
5. Possible cause one: point cloud matching timeout. It could be that the balloon reflections introduced many false marker points, causing a point cloud matching timeout of about 2 seconds in the crazyswarm. So the cfs did not receive the positioning data resulting in drift, and finally a certain frame of positioning data was received and the cfs crashed and flew around. However, there is one thing that is not clear in this case. It is logical that the point cloud match did not succeed for 2 seconds, which means that the cfs did not receive the positioning data within 2 seconds, so the central control program should have removed all the aircraft from the flight list, and the aircraft should have failed to receive the command and crashed. (I also need to test whether the aircraft will appear to be flying sideways after the management list is removed)
6. Possible reason 2: communication blockage. There is also a possibility that rviz also receives the aircraft's positioning data in real time, but due to a brief blockage in pa communication, the central control program does not remove all cfs from the managed queue. But in this case, there is also a little explanation that if the communication is blocked, the cfs will also crash if they do not receive the cmdposition command from the HF.
Does anyone have any thoughts on this issue please?
Beta Was this translation helpful? Give feedback.
All reactions