Skip to content

Commit 583fc90

Browse files
committed
opal hotel: only delete events that have not yet fired
The eviction callback, for convenience (and to avoid code duplication), use to call opal_hotel_checkout(). However, opal_hotel_checkout() deletes the eviction event -- which is fine to do when opal_hotel_checkout() is invoked by the application. But when it's invoked by the same event that it's deleting, it can cause Bad Things to happen. For simplicity, instead of invoking opal_hotel_checkout() from the eviction callback, just duplicate the checkout logic into the eviction callback function (and skip the delete-the-evict-event part). For good measure, put a comment in all three places where the checkout logic occurs (because it's inlined): don't change this logic without changing all 3 places. Finally, also add a line in the docs for opal_hotel_init() warning users from calling opal_hotel_checkout() from their eviction callback. (cherry picked from commit open-mpi/ompi@270cc11)
1 parent 0e3b111 commit 583fc90

File tree

2 files changed

+26
-8
lines changed

2 files changed

+26
-8
lines changed

opal/class/opal_hotel.c

Lines changed: 17 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
/*
2-
* Copyright (c) 2012 Cisco Systems, Inc. All rights reserved.
2+
* Copyright (c) 2012-2016 Cisco Systems, Inc. All rights reserved.
33
* Copyright (c) 2012 Los Alamos National Security, LLC. All rights reserved
44
* $COPYRIGHT$
55
*
@@ -23,12 +23,22 @@ static void local_eviction_callback(int fd, short flags, void *arg)
2323
(opal_hotel_room_eviction_callback_arg_t*) arg;
2424
void *occupant = eargs->hotel->rooms[eargs->room_num].occupant;
2525

26-
/* Remove the occupant from the room and invoke the user callback
27-
to tell them that they were evicted */
28-
opal_hotel_checkout(eargs->hotel, eargs->room_num);
29-
eargs->hotel->evict_callback_fn(eargs->hotel,
30-
eargs->room_num,
31-
occupant);
26+
/* Remove the occurpant from the room.
27+
28+
Do not change this logic without also changing the same logic
29+
in opal_hotel_checkout() and
30+
opal_hotel_checkout_and_return_occupant(). */
31+
opal_hotel_t *hotel = eargs->hotel;
32+
opal_hotel_room_t *room = &(hotel->rooms[eargs->room_num]);
33+
room->occupant = NULL;
34+
hotel->last_unoccupied_room++;
35+
assert(hotel->last_unoccupied_room < hotel->num_rooms);
36+
hotel->unoccupied_rooms[hotel->last_unoccupied_room] = eargs->room_num;
37+
38+
/* Invoke the user callback to tell them that they were evicted */
39+
hotel->evict_callback_fn(hotel,
40+
eargs->room_num,
41+
occupant);
3242
}
3343

3444

opal/class/opal_hotel.h

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
/*
2-
* Copyright (c) 2012-2013 Cisco Systems, Inc. All rights reserved.
2+
* Copyright (c) 2012-2016 Cisco Systems, Inc. All rights reserved.
33
* Copyright (c) 2012 Los Alamos National Security, LLC. All rights reserved
44
* Copyright (c) 2014 Intel, Inc. All rights reserved
55
* $COPYRIGHT$
@@ -143,6 +143,11 @@ OBJ_CLASS_DECLARATION(opal_hotel_t);
143143
* will be set - occupants will remain checked into the hotel until
144144
* explicitly checked out.
145145
*
146+
* Also note: the eviction_callback_fn should absolutely not call any
147+
* of the hotel checkout functions. Specifically: the occupant has
148+
* already been ("forcibly") checked out *before* the
149+
* eviction_callback_fn is invoked.
150+
*
146151
* @return OPAL_SUCCESS if all initializations were succesful. Otherwise,
147152
* the error indicate what went wrong in the function.
148153
*/
@@ -236,6 +241,9 @@ static inline void opal_hotel_checkout(opal_hotel_t *hotel, int room_num)
236241
/* If there's an occupant in the room, check them out */
237242
room = &(hotel->rooms[room_num]);
238243
if (OPAL_LIKELY(NULL != room->occupant)) {
244+
/* Do not change this logic without also changing the same
245+
logic in opal_hotel_checkout_and_return_occupant() and
246+
opal_hotel.c:local_eviction_callback(). */
239247
room->occupant = NULL;
240248
opal_event_del(&(room->eviction_timer_event));
241249

0 commit comments

Comments
 (0)