Skip to content

Fix a flaky (copter) autotest#32137

Closed
hunt0r wants to merge 1 commit intoArduPilot:masterfrom
hunt0r:fix-copter-autotest-flake
Closed

Fix a flaky (copter) autotest#32137
hunt0r wants to merge 1 commit intoArduPilot:masterfrom
hunt0r:fix-copter-autotest-flake

Conversation

@hunt0r
Copy link

@hunt0r hunt0r commented Feb 7, 2026

Why is the default, when waiting for a landing, to require the copter to report an altitude between 5 m and 11 m?
IMO, that makes no sense, and is likely an accidental oversight.

In case more details are desired:

  • The specific flaky test (for me) is test.Copter.BatteryMissing.
  • The failure is well-summarized in this line: BatteryMissing ( Test battery health pre-arm and missing failsafe) (Failed to attain Altitude want 8.0, reached 0.008) Observe that an altitude of 8 m +/- 3 m (i.e. 5 - 11 m) is never seen, because copter is already on ground at altitude of 0.008 m.
  • The PR which seems to have introduced this default is Autotest: make RTL and Land wait a little more verbose, don't use raw reboot on fly_battery_failsafe #14882 . I can't tell why that decision was made there, but perhaps also the actual flake started after it? 🤷

REVIEWERS: I don't know what level of testing is appropriate for a change like this. (What if some tests were depending on this odd assumption to pass?) Please help me learn what's expected.
For example, one org I worked with had a bar like ">1 flaky failure in 1000 retries before, and 0 flaky failures in 1000 retries after". Based on a ~50% flake rate locally, I'd be glad to script some test like that overnight and post the results. Should take me 10 or less, so maybe 100 repetitions is confident enough.
Also it would be great if someone else could reproduce the flake I'm seeing, and confirm this seems to fix it for them. (I'm on a ~2021 Macbook Air, in case that's very different than most dev's setups.)

This resolves a flaky test (for some users) where the test fails
because copter descends below the min alt "too fast", and lands,
but the test is waiting for it to report an alt which is "on the way"
to that landing, and times out waiting.
@peterbarker
Copy link
Contributor

Observe that an altitude of 8 m +/- 3 m (i.e. 5 - 11 m) is never seen, because copter is already on ground at altitude of 0.008 m.

How does that happen? You manage to get from 10m to underneath the magic height before we emit the statustext?

It'd be worth looking at the autotest tlog - compare when you receive the STATUSTEXT to the preceding GLOBAL_POSITION_INT to see if that's the case. It would be a pretty length delay for that STATUSTEXT!

I want to make sure we're not missing something here, that we're fixing the right problem in the right place.

@peterbarker
Copy link
Contributor

Why is the default, when waiting for a landing, to require the copter to report an altitude between 5 m and 11 m? IMO, that makes no sense, and is likely an accidental oversight.

No, that wasn't accidental. Firstly, you might not actually be landing at 0 relalt, so a condition of zero may never become true. Secondly, it was probably to preserve existing behaviour somewhere in factoring code. Thirdly, this stuff is really in place to provide a comforter to the user in the form of progress messages showing the vehicle descending.

REVIEWERS: I don't know what level of testing is appropriate for a change like this. (What if some tests were depending on this odd assumption to pass?) Please help me learn what's expected. For example, one org I worked with had a bar like ">1 flaky failure in 1000 retries before, and 0 flaky failures in 1000 retries after". Based on a ~50% flake rate locally, I'd be glad to script some test like that overnight and post the results. Should take me 10 or less, so maybe 100 repetitions is confident enough. Also it would be great if someone else could reproduce the flake I'm seeing, and confirm this seems to fix it for them. (I'm on a ~2021 Macbook Air, in case that's very different than most dev's setups.)

Sadly, these race conditions are often difficult to reproduce - I can't do so here. This is an interesting one...

@hunt0r
Copy link
Author

hunt0r commented Feb 8, 2026

We now think there is some problem in how my system (A ~2021 Macbook Air) runs the test, rather than the test itself being broken. Closing this PR accordingly.

@hunt0r hunt0r closed this Feb 8, 2026
@hunt0r hunt0r deleted the fix-copter-autotest-flake branch February 9, 2026 13:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants