Skip to content

Conversation

@ggouaillardet
Copy link
Contributor

Signed-off-by: Gilles Gouaillardet [email protected]

@ggouaillardet
Copy link
Contributor Author

@bosilca can you please double check this ?

currently

valgrind ./hello_c 
==16053== Memcheck, a memory error detector
==16053== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==16053== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==16053== Command: ./hello_c
==16053== 
==16053== Invalid read of size 8
==16053==    at 0x5AFD400: opal_pointer_array_set_item (opal_pointer_array.c:300)
==16053==    by 0x4E787AE: ompi_mpi_errcode_init (errcode.c:206)
==16053==    by 0x4E8F98A: ompi_mpi_init (ompi_mpi_init.c:751)
==16053==    by 0x4ECD4C6: PMPI_Init (pinit.c:66)
==16053==    by 0x4008DD: main (hello_c.c:18)
==16053==  Address 0xe57ca78 is 0 bytes after a block of size 8 alloc'd
==16053==    at 0x4C29975: calloc (vg_replace_malloc.c:711)
==16053==    by 0x5AFCC6F: opal_pointer_array_init (opal_pointer_array.c:195)
==16053==    by 0x4E7544E: ompi_mpi_errcode_init (errcode.c:136)
==16053==    by 0x4E8F98A: ompi_mpi_init (ompi_mpi_init.c:751)
==16053==    by 0x4ECD4C6: PMPI_Init (pinit.c:66)
==16053==    by 0x4008DD: main (hello_c.c:18)

the array has size == 64, so free_bits is one unit64_t.
the error occurs with all bits are set, and FIND_FIRST_FREE tries to access table->free_bits[1]

@bosilca
Copy link
Member

bosilca commented Apr 26, 2017

I see what is going on, if all bits are set then the FIND_FIRST_ZERO macro misbehave. However, we know where the array is full, because the number_free must be zero.

diff --git a/opal/class/opal_pointer_array.c b/opal/class/opal_pointer_array.c
index 133ace8902..9b2da8be58 100644
--- a/opal/class/opal_pointer_array.c
+++ b/opal/class/opal_pointer_array.c
@@ -88,9 +88,13 @@ static void opal_pointer_array_destruct(opal_pointer_array_t *array)
  * from the indicated position until it finds a zero bit. If SET is true,
  * the bit is set. The position of the bit is returned in store.
  */
-#define FIND_FIRST_ZERO(START_IDX, STORE, SET)                          \
+#define FIND_FIRST_ZERO(START_IDX, STORE)                               \
     do {                                                                \
         uint32_t __b_idx, __b_pos;                                      \
+        if( 0 == table->number_free ) {                                 \
+            (STORE) = table->size;                                      \
+            break;                                                      \
+        }                                                               \
         GET_BIT_POS((START_IDX), __b_idx, __b_pos);                     \
         for (; table->free_bits[__b_idx] == 0xFFFFFFFFFFFFFFFFULL; __b_idx++); \
         assert(__b_idx < (uint32_t)table->size);                        \
@@ -115,9 +119,6 @@ static void opal_pointer_array_destruct(opal_pointer_array_t *array)
         if( 0x0000000000000001ULL == (__check_value & 0x0000000000000001ULL) ) { \
             __b_pos += 1;                                               \
         }                                                               \
-        if( (SET) ) {                                                   \
-            table->free_bits[__b_idx] |= (1ULL << __b_pos);             \
-        }                                                               \
         (STORE) = (__b_idx * 8 * sizeof(uint64_t)) + __b_pos;           \
     } while(0)

@@ -240,7 +241,7 @@ int opal_pointer_array_add(opal_pointer_array_t *table, void *ptr)
     table->number_free--;
     SET_BIT(index);
     if (table->number_free > 0) {
-        FIND_FIRST_ZERO(index, table->lowest_free, 0);
+        FIND_FIRST_ZERO(index, table->lowest_free);
     } else {
         table->lowest_free = table->size;
     }
@@ -297,7 +298,7 @@ int opal_pointer_array_set_item(opal_pointer_array_t *table, int index,
             SET_BIT(index);
             /* Reset lowest_free if required */
             if ( index == table->lowest_free ) {
-                FIND_FIRST_ZERO(index, table->lowest_free, 0);
+                FIND_FIRST_ZERO(index, table->lowest_free);
             }
         } else {
             assert( index != table->lowest_free );
@@ -373,7 +374,7 @@ bool opal_pointer_array_test_and_set_item (opal_pointer_array_t *table,
     /* Reset lowest_free if required */
     if( table->number_free > 0 ) {
         if ( index == table->lowest_free ) {
-            FIND_FIRST_ZERO(index, table->lowest_free, 0);
+            FIND_FIRST_ZERO(index, table->lowest_free);
         }
     } else {
         table->lowest_free = table->size;

@ggouaillardet
Copy link
Contributor Author

fixed in 3c6631f

@jsquyres
Copy link
Member

jsquyres commented Jun 5, 2017

@ggouaillardet @bosilca Did 3c6631f cause #3484? I ask because we are still seeing opal_pointer_array() assert failures every night at Absoft in MTT (as of 5 June 2017).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants