Skip to content

Conversation

@dmpots
Copy link
Contributor

@dmpots dmpots commented Mar 5, 2025

The dotest framework had an existing decorator to mark flakey tests. It was not implemented and would simply run the underlying test. This commit modifies the decorator to run the test multiple times in the case of failure and only raises the test failure when all attempts fail.

The dotest framework had an existing decorator to mark flakey tests. It
was not implemented and would simply run the underlying test. This
commit modifies the decorator to run the test multiple times in the case
of failure and only raises the test failure when all attempts fail.
@dmpots dmpots requested a review from JDevlieghere as a code owner March 5, 2025 02:39
@llvmbot llvmbot added the lldb label Mar 5, 2025
@llvmbot
Copy link
Member

llvmbot commented Mar 5, 2025

@llvm/pr-subscribers-lldb

Author: David Peixotto (dmpots)

Changes

The dotest framework had an existing decorator to mark flakey tests. It was not implemented and would simply run the underlying test. This commit modifies the decorator to run the test multiple times in the case of failure and only raises the test failure when all attempts fail.


Full diff: https://github.com/llvm/llvm-project/pull/129817.diff

1 Files Affected:

  • (modified) lldb/packages/Python/lldbsuite/test/decorators.py (+16-3)
diff --git a/lldb/packages/Python/lldbsuite/test/decorators.py b/lldb/packages/Python/lldbsuite/test/decorators.py
index c48c0b2f77944..c4de3f8f10751 100644
--- a/lldb/packages/Python/lldbsuite/test/decorators.py
+++ b/lldb/packages/Python/lldbsuite/test/decorators.py
@@ -3,6 +3,7 @@
 from packaging import version
 import ctypes
 import locale
+import logging
 import os
 import platform
 import re
@@ -525,12 +526,24 @@ def expectedFailureWindows(bugnumber=None):
     return expectedFailureOS(["windows"], bugnumber)
 
 
-# TODO: This decorator does not do anything. Remove it.
-def expectedFlakey(expected_fn, bugnumber=None):
+# This decorator can be used to mark a test that can fail non-deterministically.
+# The decorator causes the underlying test to be re-run if it encounters a
+# failure. After `num_retries` attempts the test will be marked as a failure
+# if it has not yet passed.
+def expectedFlakey(expected_fn, bugnumber=None, num_retries=3):
     def expectedFailure_impl(func):
         @wraps(func)
         def wrapper(*args, **kwargs):
-            func(*args, **kwargs)
+            for i in range(1, num_retries+1):
+                try:
+                    return func(*args, **kwargs)
+                except Exception:
+                    logging.warning(
+                        f"expectedFlakey: test {func} failed attempt ({i}/{num_retries})"
+                    )
+                    # If the last attempt fails then re-raise the original failure.
+                    if i == num_retries:
+                        raise
 
         return wrapper
 

@github-actions
Copy link

github-actions bot commented Mar 5, 2025

✅ With the latest revision this PR passed the Python code formatter.

@JDevlieghere
Copy link
Member

I personally don't think we should have such a decorator: test shouldn't be flakey. In my experience, when tests are flakey, they are almost always either poor tests or actual bugs. I'm worried that a decorator like this makes it easy to sweep these issues under the rug. That's not to say that things cannot be flakey sometimes: because of how we test the debugger, we depend on a lot of things, many of which are out of our control and can cause a test to fail. But that's different from a specific test being flakey, which is what this decorator would be used for.

@dmpots
Copy link
Contributor Author

dmpots commented Mar 5, 2025

That's not to say that things cannot be flakey sometimes: because of how we test the debugger, we depend on a lot of things, many of which are out of our control and can cause a test to fail. But that's different from a specific test being flakey, which is what this decorator would be used for.

Thanks, I appreciate your thoughts. For some context here, I am looking to replace some internal scripts that handle failures by re-running tests. I thought we might be able to leverage the built-in features of the dotest to handle some of this. Let me collect some more data to see how much/what kind of flakiness we have.

Do you have any suggestions on how we should handle the "expected" flakiness because of how we test the debugger? Do you think this is something we should try to solve as part of the lldb testing framework?

@DavidSpickett
Copy link
Collaborator

libcxx also has something that re-runs tests I think, worth having a look there. I don't think it's a built in llvm-lit feature though.

If the current decorator does not change anything I wonder if we should be converting the existing uses back to normal. Unless someone out there is monkey patching it, these tests would have been flakey for them.

...or they have done what you did and built their own thing, not realising that the decorator was supposed to do that in the first place.

@DavidSpickett
Copy link
Collaborator

DavidSpickett commented Mar 6, 2025

Do you have any suggestions on how we should handle the "expected" flakiness because of how we test the debugger? Do you think this is something we should try to solve as part of the lldb testing framework?

Super basic tip, in case you didn't notice already, Linaro's experience is that running LLDB testing on a shared machine is a bad idea. For AArch64 we run on a dedicated machine and that has worked well. It's not very large, it's just free of distractions.

Edit: Same machine with dedicated core allocations also works.

For Arm we are on a shared server with much more resource than we need for LLDB, but we can't always get it because of other workers. So we set a parallelism limit way below the actual core count so that lit doesn't over estimate what it can run.

This leaves us with the actually flakey tests that ideally we'd find proper fixes for as Jonas said.

@JDevlieghere
Copy link
Member

+1 on everything @DavidSpickett said. We're doing pretty much the same thing for our CI.

@dmpots
Copy link
Contributor Author

dmpots commented Mar 6, 2025

Thanks for all the suggestions. I'll close the PR and try to make some progress internally on this.

@dmpots dmpots closed this Mar 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants