Skip to content

Conversation

@mkurnosov
Copy link
Contributor

Prepare the placeholder for the array of requests at module initialization time.

The call of coll/spacc/MPI_Allreduce with count < comm_size or with non-commutative binary operation produces the segmentation fault (only in v3.1.x).

The problem is that spacc/MPI_Allreduce in this case switched to the ompi_coll_base_allreduce_intra_basic_linear. But a lot of base collective operations require prepared placeholder for the array of requests.

Signed-off-by: Mikhail Kurnosov [email protected]

Prepare the placeholder for the array of requests.

The call of coll/spacc/MPI_Allreduce with count < comm_size
or with non-commutative binary operation produces the segmentation fault (only in v3.1.x).

The problem is that spacc/MPI_Allreduce in this case switched to the `ompi_coll_base_allreduce_intra_basic_linear`.
But a lot of base collectives require prepared placeholder for the array of requests.

Signed-off-by: Mikhail Kurnosov <[email protected]>
@mkurnosov
Copy link
Contributor Author

Bug reproducer:

#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>

int main(int argc, char **argv)
{
    MPI_Comm comm = MPI_COMM_WORLD;
    int rank, comm_size;

    MPI_Init(&argc, &argv);
    MPI_Comm_size(comm, &comm_size);
    MPI_Comm_rank(comm, &rank);
    int count = comm_size - 1;

    int *sbuf = malloc(count * sizeof(*sbuf));
    int *rbuf = malloc(count * sizeof(*rbuf));
    if (!sbuf || !rbuf) {
        fprintf(stderr, "Error: malloc failed\n");
        MPI_Abort(comm, 1);
    }
    for (int i = 0; i < count; i++) {
        sbuf[i] = rank + i;
        rbuf[i] = 0;
    }

    MPI_Allreduce(sbuf, rbuf, count, MPI_INT, MPI_SUM, comm);

    /* Validate */
    for (int i = 0; i < count; i++) {
        int res = i * comm_size + (comm_size * (comm_size - 1)) / 2;
        if (rbuf[i] != res) {
            fprintf(stderr, "Error: validation error\n");
            break;
        }
    }

    free(sbuf);
    free(rbuf);
    MPI_Finalize();
    return 0;
}

$ mpirun --hostfile ./hostfile -np 4 --mca coll_spacc_priority 100 ./allreduce_test

@jsquyres jsquyres added the bug label Nov 16, 2018
@jsquyres jsquyres added this to the v3.1.4 milestone Nov 16, 2018
@bwbarrett bwbarrett merged commit eb733df into open-mpi:v3.1.x Jan 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants