Skip to content

Commit f24e0c1

Browse files
author
Balazs Gibizer
committed
Prevent leaked eventlets to send notifications
In out functional tests we run nova services as eventlets. Also those services can spawn there own eventlets for RPC or other parallel processing. The test case executor only sees and tracks the main eventlet where the code of the test case is running. When that is finishes the test executor considers the test case to be finished regardless of the other spawned eventlets. This could lead to leaked eventlets that are running in parallel with later test cases. One way that it can cause trouble is via the global variables in nova.rpc module. Those globals are re-initialized for each test case so they are not directly leaking information between test cases. However if a late eventlet calls nova.rpc.get_versioned_notifier() it will get a totally usable FakeVersionedNotifier object regardless of which test case this notifier is belongs to or which test case the eventlet belongs to. This way the late eventlet can send a notification to the currently running test case and therefore can make it fail. The current case we saw is the following: 1) The test case nova.tests.functional.test_servers.ServersTestV219.test_description_errors creates a server but don't wait for it to reach terminal state (ACTIVE / ERROR). This test case finishes quickly but leaks running eventlets in the background waiting for some RPC call to return. 2) As the test case finished the cleanup code deletes the test case specific setup, including the DB. 3) The test executor moves forward and starts running another test case 4) 60 seconds later the leaked eventlet times out waiting for the RPC call to return and tries doing things, but fails as the DB is already gone. Then it tries to report this as an error notification. It calls nova.rpc.get_versioned_notifier() and gets a fresh notifier that is connected to the currently running test case. Then emits the error notification there. 5) The currently running test case also waits for an error notification to be triggered by the currently running test code. But it gets the notification form the late eventlet first. As the content of the notification does not match with the expectations the currently running test case fails. The late eventlet prints a lot of error about the DB being gone making the troubleshooting pretty hard. This patch proposes a way to fix this by marking each eventlet at spawn time with the id of the test case that was directly or indirectly started it. Then when the NotificationFixture gets a notification it compares the test case id stored in the calling eventlet with the id of the test case initialized the NotificationFixture. If the two ids do not match then the fixture ignores the notification and raises an exception to the caller eventlet to make it terminate. Change-Id: I012dcf63306bae624dc4f66aae6c6d96a20d4327 Closes-Bug: #1946339 (cherry picked from commit 61fc81a)
1 parent e6c6880 commit f24e0c1

File tree

3 files changed

+122
-2
lines changed

3 files changed

+122
-2
lines changed

nova/test.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -177,6 +177,9 @@ def setUp(self):
177177
with fixtures.EnvironmentVariable('OS_LOG_CAPTURE', '0'):
178178
super(TestCase, self).setUp()
179179

180+
self.useFixture(
181+
nova_fixtures.PropagateTestCaseIdToChildEventlets(self.id()))
182+
180183
# How many of which service we've started. {$service-name: $count}
181184
self._service_fixture_count = collections.defaultdict(int)
182185

nova/tests/fixtures/notifications.py

Lines changed: 35 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
import functools
1515
import threading
1616

17+
import eventlet
1718
import fixtures
1819
from oslo_log import log as logging
1920
import oslo_messaging
@@ -76,6 +77,7 @@ class FakeNotifier(object):
7677

7778
def __init__(
7879
self, transport, publisher_id, serializer=None, parent=None,
80+
test_case_id=None
7981
):
8082
self.transport = transport
8183
self.publisher_id = publisher_id
@@ -92,13 +94,16 @@ def __init__(
9294
functools.partial(self._notify, priority.upper()),
9395
)
9496

97+
self.test_case_id = test_case_id
98+
9599
def prepare(self, publisher_id=None):
96100
if publisher_id is None:
97101
publisher_id = self.publisher_id
98102

99103
return self.__class__(
100104
self.transport, publisher_id,
101105
serializer=self._serializer, parent=self,
106+
test_case_id=self.test_case_id
102107
)
103108

104109
def _notify(self, priority, ctxt, event_type, payload):
@@ -130,8 +135,10 @@ def reset(self):
130135
class FakeVersionedNotifier(FakeNotifier):
131136
def __init__(
132137
self, transport, publisher_id, serializer=None, parent=None,
138+
test_case_id=None
133139
):
134-
super().__init__(transport, publisher_id, serializer)
140+
super().__init__(
141+
transport, publisher_id, serializer, test_case_id=test_case_id)
135142
if parent:
136143
self.versioned_notifications = parent.versioned_notifications
137144
else:
@@ -142,7 +149,31 @@ def __init__(
142149
else:
143150
self.subscriptions = collections.defaultdict(_Sub)
144151

152+
@staticmethod
153+
def _get_sender_test_case_id():
154+
current = eventlet.getcurrent()
155+
# NOTE(gibi) not all eventlet spawn is under our control, so there can
156+
# be senders without test_case_id set, find the first ancestor that
157+
# was spawned from nova.utils.spawn[_n] and therefore has the id set.
158+
while not getattr(current, 'test_case_id', None):
159+
current = current.parent
160+
return current.test_case_id
161+
145162
def _notify(self, priority, ctxt, event_type, payload):
163+
sender_test_case_id = self._get_sender_test_case_id()
164+
# NOTE(gibi): this is here to prevent late notifications from already
165+
# finished test cases to break the currently running test case. See
166+
# more in https://bugs.launchpad.net/nova/+bug/1946339
167+
if sender_test_case_id != self.test_case_id:
168+
raise RuntimeError(
169+
'FakeVersionedNotifier received %s notification emitted by %s '
170+
'test case which is different from the currently running test '
171+
'case %s. This notification is ignored. The sender test case '
172+
'probably leaked a running eventlet that emitted '
173+
'notifications after the test case finished. Now this eventlet'
174+
'is terminated by raising this exception.' %
175+
(event_type, sender_test_case_id, self.test_case_id))
176+
146177
payload = self._serializer.serialize_entity(ctxt, payload)
147178
notification = {
148179
'publisher_id': self.publisher_id,
@@ -180,7 +211,9 @@ def setUp(self):
180211
self.fake_versioned_notifier = FakeVersionedNotifier(
181212
rpc.NOTIFIER.transport,
182213
rpc.NOTIFIER.publisher_id,
183-
serializer=getattr(rpc.NOTIFIER, '_serializer', None))
214+
serializer=getattr(rpc.NOTIFIER, '_serializer', None),
215+
test_case_id=self.test.id()
216+
)
184217
if rpc.LEGACY_NOTIFIER and rpc.NOTIFIER:
185218
self.test.stub_out('nova.rpc.LEGACY_NOTIFIER', self.fake_notifier)
186219
self.test.stub_out(

nova/tests/fixtures/nova.py

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,10 +18,12 @@
1818

1919
import collections
2020
from contextlib import contextmanager
21+
import functools
2122
import logging as std_logging
2223
import os
2324
import warnings
2425

26+
import eventlet
2527
import fixtures
2628
import futurist
2729
import mock
@@ -56,6 +58,7 @@
5658
from nova.scheduler import weights
5759
from nova import service
5860
from nova.tests.functional.api import client
61+
from nova import utils
5962

6063
CONF = cfg.CONF
6164
LOG = logging.getLogger(__name__)
@@ -1516,3 +1519,84 @@ def fail(*a, **k):
15161519
except ImportError:
15171520
self.useFixture(fixtures.MonkeyPatch(
15181521
meth, poison_configure(meth, why)))
1522+
1523+
1524+
class PropagateTestCaseIdToChildEventlets(fixtures.Fixture):
1525+
"""A fixture that adds the currently running test case id to each spawned
1526+
eventlet. This information then later used by the NotificationFixture to
1527+
detect if a notification was emitted by an eventlet that was spawned by a
1528+
previous test case so such late notification can be ignored. For more
1529+
background about what issues this can prevent see
1530+
https://bugs.launchpad.net/nova/+bug/1946339
1531+
1532+
"""
1533+
def __init__(self, test_case_id):
1534+
self.test_case_id = test_case_id
1535+
1536+
def setUp(self):
1537+
super().setUp()
1538+
1539+
# set the id on the main eventlet
1540+
c = eventlet.getcurrent()
1541+
c.test_case_id = self.test_case_id
1542+
1543+
orig_spawn = utils.spawn
1544+
1545+
def wrapped_spawn(func, *args, **kwargs):
1546+
# This is still runs before the eventlet.spawn so read the id for
1547+
# propagation
1548+
caller = eventlet.getcurrent()
1549+
# If there is no id set on us that means we were spawned with other
1550+
# than nova.utils.spawn or spawn_n so the id propagation chain got
1551+
# broken. We fall back to self.test_case_id from the fixture which
1552+
# is good enough
1553+
caller_test_case_id = getattr(
1554+
caller, 'test_case_id', None) or self.test_case_id
1555+
1556+
@functools.wraps(func)
1557+
def test_case_id_wrapper(*args, **kwargs):
1558+
# This runs after the eventlet.spawn in the new child.
1559+
# Propagate the id from our caller eventlet
1560+
current = eventlet.getcurrent()
1561+
current.test_case_id = caller_test_case_id
1562+
return func(*args, **kwargs)
1563+
1564+
# call the original spawn to create the child but with our
1565+
# new wrapper around its target
1566+
return orig_spawn(test_case_id_wrapper, *args, **kwargs)
1567+
1568+
# let's replace nova.utils.spawn with the wrapped one that injects
1569+
# our initialization to the child eventlet
1570+
self.useFixture(
1571+
fixtures.MonkeyPatch('nova.utils.spawn', wrapped_spawn))
1572+
1573+
# now do the same with spawn_n
1574+
orig_spawn_n = utils.spawn_n
1575+
1576+
def wrapped_spawn_n(func, *args, **kwargs):
1577+
# This is still runs before the eventlet.spawn so read the id for
1578+
# propagation
1579+
caller = eventlet.getcurrent()
1580+
# If there is no id set on us that means we were spawned with other
1581+
# than nova.utils.spawn or spawn_n so the id propagation chain got
1582+
# broken. We fall back to self.test_case_id from the fixture which
1583+
# is good enough
1584+
caller_test_case_id = getattr(
1585+
caller, 'test_case_id', None) or self.test_case_id
1586+
1587+
@functools.wraps(func)
1588+
def test_case_id_wrapper(*args, **kwargs):
1589+
# This runs after the eventlet.spawn in the new child.
1590+
# Propagate the id from our caller eventlet
1591+
current = eventlet.getcurrent()
1592+
current.test_case_id = caller_test_case_id
1593+
return func(*args, **kwargs)
1594+
1595+
# call the original spawn_n to create the child but with our
1596+
# new wrapper around its target
1597+
return orig_spawn_n(test_case_id_wrapper, *args, **kwargs)
1598+
1599+
# let's replace nova.utils.spawn_n with the wrapped one that injects
1600+
# our initialization to the child eventlet
1601+
self.useFixture(
1602+
fixtures.MonkeyPatch('nova.utils.spawn_n', wrapped_spawn_n))

0 commit comments

Comments
 (0)