tests: Use GreenThreadPoolExecutor.shutdown(wait=True)

melwitt · gibizer · commit 22cccb9bd9b5 · 2024-02-12T17:42:30.000+01:00
We are still having some issues in the gate where greenlets from previous tests continue to run while the next test starts, causing false negative failures in unit or functional test jobs. This adds a new fixture that will ensure GreenThreadPoolExecutor.shutdown() is called with wait=True, to wait for greenlets in the pool to finish running before moving on. In local testing, doing this does not appear to adversely affect test run times, which was my primary concern. As a baseline, I ran a subset of functional tests in a loop until failure without the patch and after 11 hours, I got a failure reproducing the bug. With the patch, running the same subset of functional tests in a loop has been running for 24 hours and has not failed yet. Based on this, I think it may be worth trying this out to see if it will help stability of our unit and functional test jobs. And if it ends up impacting test run times or causes other issues, we can revert it. Partial-Bug: #1946339 Change-Id: Ia916310522b007061660172fa4d63d0fde9a55ac (cherry picked from commit c095cfe)
diff --git a/nova/test.py b/nova/test.py
@@ -317,6 +317,13 @@ def setUp(self):
         # all other tests.
         scheduler_utils.reset_globals()
 
+        # Wait for bare greenlets spawn_n()'ed from a GreenThreadPoolExecutor
+        # to finish before moving on from the test. When greenlets from a
+        # previous test remain running, they may attempt to access structures
+        # (like the database) that have already been torn down and can cause
+        # the currently running test to fail.
+        self.useFixture(nova_fixtures.GreenThreadPoolShutdownWait())
+
     def _setup_cells(self):
         """Setup a normal cellsv2 environment.
 
diff --git a/nova/tests/fixtures/nova.py b/nova/tests/fixtures/nova.py
@@ -1938,3 +1938,38 @@ def setUp(self):
             'nova.compute.manager.ComputeManager.'
             '_ensure_existing_node_identity',
             mock.DEFAULT))
+
+
+class GreenThreadPoolShutdownWait(fixtures.Fixture):
+    """Always wait for greenlets in greenpool to finish.
+
+    We use the futurist.GreenThreadPoolExecutor, for example, in compute
+    manager to run live migration jobs. It runs those jobs in bare greenlets
+    created by eventlet.spawn_n(). Bare greenlets cannot be killed the same
+    way as GreenThreads created by eventlet.spawn().
+
+    Because they cannot be killed, in the test environment we must either let
+    them run to completion or move on while they are still running (which can
+    cause test failures as the leaked greenlets attempt to access structures
+    that have already been torn down).
+
+    When a compute service is stopped by Service.stop(), the compute manager's
+    cleanup_host() method is called and while cleaning up, the compute manager
+    calls the GreenThreadPoolExecutor.shutdown() method with wait=False. This
+    means that a test running GreenThreadPoolExecutor jobs will not wait for
+    the bare greenlets to finish running -- it will instead move on immediately
+    while greenlets are still running.
+
+    This fixture will ensure GreenThreadPoolExecutor.shutdown() is always
+    called with wait=True in an effort to reduce the number of leaked bare
+    greenlets.
+
+    See https://bugs.launchpad.net/nova/+bug/1946339 for details.
+    """
+
+    def setUp(self):
+        super().setUp()
+        real_shutdown = futurist.GreenThreadPoolExecutor.shutdown
+        self.useFixture(fixtures.MockPatch(
+            'futurist.GreenThreadPoolExecutor.shutdown',
+            lambda self, wait: real_shutdown(self, wait=True)))