Skip to content
This repository was archived by the owner on Sep 30, 2022. It is now read-only.

Commit 4e9efd4

Browse files
authored
Merge pull request #1260 from ggouaillardet/topic/v2.x/coll_fixes
v2.x: various mca/coll fixes
2 parents 29b837a + b4eb015 commit 4e9efd4

40 files changed

+846
-791
lines changed
Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
/* This comment applies to all collectives (including the basic
2+
* module) where we allocate a temporary buffer. For the next few
3+
* lines of code, it's tremendously complicated how we decided that
4+
* this was the Right Thing to do. Sit back and enjoy. And prepare
5+
* to have your mind warped. :-)
6+
*
7+
* Recall some definitions (I always get these backwards, so I'm
8+
* going to put them here):
9+
*
10+
* extent: the length from the lower bound to the upper bound -- may
11+
* be considerably larger than the buffer required to hold the data
12+
* (or smaller! But it's easiest to think about when it's larger).
13+
*
14+
* true extent: the exact number of bytes required to hold the data
15+
* in the layout pattern in the datatype.
16+
*
17+
* For example, consider the following buffer (just talking about
18+
* true_lb, extent, and true extent -- extrapolate for true_ub:
19+
*
20+
* A B C
21+
* --------------------------------------------------------
22+
* | | |
23+
* --------------------------------------------------------
24+
*
25+
* There are multiple cases:
26+
*
27+
* 1. A is what we give to MPI_Send (and friends), and A is where
28+
* the data starts, and C is where the data ends. In this case:
29+
*
30+
* - extent: C-A
31+
* - true extent: C-A
32+
* - true_lb: 0
33+
*
34+
* A C
35+
* --------------------------------------------------------
36+
* | |
37+
* --------------------------------------------------------
38+
* <=======================extent=========================>
39+
* <======================true extent=====================>
40+
*
41+
* 2. A is what we give to MPI_Send (and friends), B is where the
42+
* data starts, and C is where the data ends. In this case:
43+
*
44+
* - extent: C-A
45+
* - true extent: C-B
46+
* - true_lb: positive
47+
*
48+
* A B C
49+
* --------------------------------------------------------
50+
* | | User buffer |
51+
* --------------------------------------------------------
52+
* <=======================extent=========================>
53+
* <===============true extent=============>
54+
*
55+
* 3. B is what we give to MPI_Send (and friends), A is where the
56+
* data starts, and C is where the data ends. In this case:
57+
*
58+
* - extent: C-A
59+
* - true extent: C-A
60+
* - true_lb: negative
61+
*
62+
* A B C
63+
* --------------------------------------------------------
64+
* | | User buffer |
65+
* --------------------------------------------------------
66+
* <=======================extent=========================>
67+
* <======================true extent=====================>
68+
*
69+
* 4. MPI_BOTTOM is what we give to MPI_Send (and friends), B is
70+
* where the data starts, and C is where the data ends. In this
71+
* case:
72+
*
73+
* - extent: C-MPI_BOTTOM
74+
* - true extent: C-B
75+
* - true_lb: [potentially very large] positive
76+
*
77+
* MPI_BOTTOM B C
78+
* --------------------------------------------------------
79+
* | | User buffer |
80+
* --------------------------------------------------------
81+
* <=======================extent=========================>
82+
* <===============true extent=============>
83+
*
84+
* So in all cases, for a temporary buffer, all we need to malloc()
85+
* is a buffer of size true_extent. We therefore need to know two
86+
* pointer values: what value to give to MPI_Send (and friends) and
87+
* what value to give to free(), because they might not be the same.
88+
*
89+
* Clearly, what we give to free() is exactly what was returned from
90+
* malloc(). That part is easy. :-)
91+
*
92+
* What we give to MPI_Send (and friends) is a bit more complicated.
93+
* Let's take the 4 cases from above:
94+
*
95+
* 1. If A is what we give to MPI_Send and A is where the data
96+
* starts, then clearly we give to MPI_Send what we got back from
97+
* malloc().
98+
*
99+
* 2. If B is what we get back from malloc, but we give A to
100+
* MPI_Send, then the buffer range [A,B) represents "dead space"
101+
* -- no data will be put there. So it's safe to give B-true_lb to
102+
* MPI_Send. More specifically, the true_lb is positive, so B-true_lb is
103+
* actually A.
104+
*
105+
* 3. If A is what we get back from malloc, and B is what we give to
106+
* MPI_Send, then the true_lb is negative, so A-true_lb will actually equal
107+
* B.
108+
*
109+
* 4. Although this seems like the weirdest case, it's actually
110+
* quite similar to case #2 -- the pointer we give to MPI_Send is
111+
* smaller than the pointer we got back from malloc().
112+
*
113+
* Hence, in all cases, we give (return_from_malloc - true_lb) to MPI_Send.
114+
*
115+
* This works fine and dandy if we only have (count==1), which we
116+
* rarely do. ;-) So we really need to allocate (true_extent +
117+
* ((count - 1) * extent)) to get enough space for the rest. This may
118+
* be more than is necessary, but it's ok.
119+
*
120+
* Simple, no? :-)
121+
*
122+
*/
123+
124+

ompi/mca/coll/base/coll_base_allgather.c

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -167,19 +167,16 @@ int ompi_coll_base_allgather_intra_bruck(const void *sbuf, int scount,
167167
- copy blocks from shift buffer starting at block [rank] in rbuf.
168168
*/
169169
if (0 != rank) {
170-
ptrdiff_t true_extent, true_lb;
171170
char *free_buf = NULL, *shift_buf = NULL;
171+
ptrdiff_t span, gap;
172172

173-
err = ompi_datatype_get_true_extent(rdtype, &true_lb, &true_extent);
174-
if (MPI_SUCCESS != err) { line = __LINE__; goto err_hndl; }
173+
span = opal_datatype_span(&rdtype->super, (size - rank) * rcount, &gap);
175174

176-
free_buf = (char*) calloc(((true_extent +
177-
((ptrdiff_t)(size - rank) * (ptrdiff_t)rcount - 1) * rext)),
178-
sizeof(char));
175+
free_buf = (char*)calloc(span, sizeof(char));
179176
if (NULL == free_buf) {
180177
line = __LINE__; err = OMPI_ERR_OUT_OF_RESOURCE; goto err_hndl;
181178
}
182-
shift_buf = free_buf - true_lb;
179+
shift_buf = free_buf - gap;
183180

184181
/* 1. copy blocks [0 .. (size - rank - 1)] from rbuf to shift buffer */
185182
err = ompi_datatype_copy_content_same_ddt(rdtype, ((ptrdiff_t)(size - rank) * (ptrdiff_t)rcount),

ompi/mca/coll/base/coll_base_allreduce.c

Lines changed: 17 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
* Copyright (c) 2009 University of Houston. All rights reserved.
1414
* Copyright (c) 2013 Los Alamos National Security, LLC. All Rights
1515
* reserved.
16-
* Copyright (c) 2015 Research Organization for Information Science
16+
* Copyright (c) 2015-2016 Research Organization for Information Science
1717
* and Technology (RIST). All rights reserved.
1818
* $COPYRIGHT$
1919
*
@@ -134,9 +134,9 @@ ompi_coll_base_allreduce_intra_recursivedoubling(const void *sbuf, void *rbuf,
134134
{
135135
int ret, line, rank, size, adjsize, remote, distance;
136136
int newrank, newremote, extra_ranks;
137-
char *tmpsend = NULL, *tmprecv = NULL, *tmpswap = NULL, *inplacebuf = NULL;
138-
ptrdiff_t true_lb, true_extent, lb, extent;
137+
char *tmpsend = NULL, *tmprecv = NULL, *tmpswap = NULL, *inplacebuf_free = NULL, *inplacebuf;
139138
ompi_request_t *reqs[2] = {NULL, NULL};
139+
OPAL_PTRDIFF_TYPE span, gap;
140140

141141
size = ompi_comm_size(comm);
142142
rank = ompi_comm_rank(comm);
@@ -154,13 +154,10 @@ ompi_coll_base_allreduce_intra_recursivedoubling(const void *sbuf, void *rbuf,
154154
}
155155

156156
/* Allocate and initialize temporary send buffer */
157-
ret = ompi_datatype_get_extent(dtype, &lb, &extent);
158-
if (MPI_SUCCESS != ret) { line = __LINE__; goto error_hndl; }
159-
ret = ompi_datatype_get_true_extent(dtype, &true_lb, &true_extent);
160-
if (MPI_SUCCESS != ret) { line = __LINE__; goto error_hndl; }
161-
162-
inplacebuf = (char*) malloc(true_extent + (ptrdiff_t)(count - 1) * extent);
163-
if (NULL == inplacebuf) { ret = -1; line = __LINE__; goto error_hndl; }
157+
span = opal_datatype_span(&dtype->super, count, &gap);
158+
inplacebuf_free = (char*) malloc(span);
159+
if (NULL == inplacebuf_free) { ret = -1; line = __LINE__; goto error_hndl; }
160+
inplacebuf = inplacebuf_free - gap;
164161

165162
if (MPI_IN_PLACE == sbuf) {
166163
ret = ompi_datatype_copy_content_same_ddt(dtype, count, inplacebuf, (char*)rbuf);
@@ -267,13 +264,14 @@ ompi_coll_base_allreduce_intra_recursivedoubling(const void *sbuf, void *rbuf,
267264
if (ret < 0) { line = __LINE__; goto error_hndl; }
268265
}
269266

270-
if (NULL != inplacebuf) free(inplacebuf);
267+
if (NULL != inplacebuf_free) free(inplacebuf_free);
271268
return MPI_SUCCESS;
272269

273270
error_hndl:
274271
OPAL_OUTPUT((ompi_coll_base_framework.framework_output, "%s:%4d\tRank %d Error occurred %d\n",
275272
__FILE__, line, rank, ret));
276-
if (NULL != inplacebuf) free(inplacebuf);
273+
(void)line; // silence compiler warning
274+
if (NULL != inplacebuf_free) free(inplacebuf_free);
277275
return ret;
278276
}
279277

@@ -629,9 +627,9 @@ ompi_coll_base_allreduce_intra_ring_segmented(const void *sbuf, void *rbuf, int
629627
int segcount, max_segcount, num_phases, phase, block_count, inbi;
630628
size_t typelng;
631629
char *tmpsend = NULL, *tmprecv = NULL, *inbuf[2] = {NULL, NULL};
632-
ptrdiff_t true_lb, true_extent, lb, extent;
633630
ptrdiff_t block_offset, max_real_segsize;
634631
ompi_request_t *reqs[2] = {NULL, NULL};
632+
OPAL_PTRDIFF_TYPE lb, extent, gap;
635633

636634
size = ompi_comm_size(comm);
637635
rank = ompi_comm_rank(comm);
@@ -649,10 +647,6 @@ ompi_coll_base_allreduce_intra_ring_segmented(const void *sbuf, void *rbuf, int
649647
}
650648

651649
/* Determine segment count based on the suggested segment size */
652-
ret = ompi_datatype_get_extent(dtype, &lb, &extent);
653-
if (MPI_SUCCESS != ret) { line = __LINE__; goto error_hndl; }
654-
ret = ompi_datatype_get_true_extent(dtype, &true_lb, &true_extent);
655-
if (MPI_SUCCESS != ret) { line = __LINE__; goto error_hndl; }
656650
ret = ompi_datatype_type_size( dtype, &typelng);
657651
if (MPI_SUCCESS != ret) { line = __LINE__; goto error_hndl; }
658652
segcount = count;
@@ -685,7 +679,10 @@ ompi_coll_base_allreduce_intra_ring_segmented(const void *sbuf, void *rbuf, int
685679
early_blockcount, late_blockcount );
686680
COLL_BASE_COMPUTE_BLOCKCOUNT( early_blockcount, num_phases, inbi,
687681
max_segcount, k);
688-
max_real_segsize = true_extent + (ptrdiff_t)(max_segcount - 1) * extent;
682+
683+
ret = ompi_datatype_get_extent(dtype, &lb, &extent);
684+
if (MPI_SUCCESS != ret) { line = __LINE__; goto error_hndl; }
685+
max_real_segsize = opal_datatype_span(&dtype->super, max_segcount, &gap);
689686

690687
/* Allocate and initialize temporary buffers */
691688
inbuf[0] = (char*)malloc(max_real_segsize);
@@ -740,8 +737,8 @@ ompi_coll_base_allreduce_intra_ring_segmented(const void *sbuf, void *rbuf, int
740737
block_count = ((rank < split_rank)? early_blockcount : late_blockcount);
741738
COLL_BASE_COMPUTE_BLOCKCOUNT(block_count, num_phases, split_phase,
742739
early_phase_segcount, late_phase_segcount)
743-
phase_count = ((phase < split_phase)?
744-
(early_phase_segcount) : (late_phase_segcount));
740+
phase_count = ((phase < split_phase)?
741+
(early_phase_segcount) : (late_phase_segcount));
745742
phase_offset = ((phase < split_phase)?
746743
((ptrdiff_t)phase * (ptrdiff_t)early_phase_segcount) :
747744
((ptrdiff_t)phase * (ptrdiff_t)late_phase_segcount + split_phase));

ompi/mca/coll/base/coll_base_alltoall.c

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -43,10 +43,10 @@ mca_coll_base_alltoall_intra_basic_inplace(const void *rbuf, int rcount,
4343
{
4444
mca_coll_base_module_t *base_module = (mca_coll_base_module_t*) module;
4545
int i, j, size, rank, err = MPI_SUCCESS, line;
46-
ompi_request_t **preq, **reqs;
46+
OPAL_PTRDIFF_TYPE ext, gap;
47+
MPI_Request *preq, *reqs;
4748
char *tmp_buffer;
4849
size_t max_size;
49-
ptrdiff_t ext, true_lb, true_ext;
5050

5151
/* Initialize. */
5252

@@ -60,16 +60,18 @@ mca_coll_base_alltoall_intra_basic_inplace(const void *rbuf, int rcount,
6060

6161
/* Find the largest receive amount */
6262
ompi_datatype_type_extent (rdtype, &ext);
63-
ompi_datatype_get_true_extent ( rdtype, &true_lb, &true_ext);
64-
max_size = true_ext + ext * (rcount-1);
63+
max_size = opal_datatype_span(&rdtype->super, rcount, &gap);
6564

6665
/* Initiate all send/recv to/from others. */
6766
reqs = coll_base_comm_get_reqs(base_module->base_data, 2);
6867
if( NULL == reqs ) { err = OMPI_ERR_OUT_OF_RESOURCE; line = __LINE__; goto error_hndl; }
6968

7069
/* Allocate a temporary buffer */
7170
tmp_buffer = calloc (max_size, 1);
72-
if (NULL == tmp_buffer) { return OMPI_ERR_OUT_OF_RESOURCE; }
71+
if (NULL == tmp_buffer) {
72+
return OMPI_ERR_OUT_OF_RESOURCE;
73+
}
74+
tmp_buffer -= gap;
7375
max_size = ext * rcount;
7476

7577
/* in-place alltoall slow algorithm (but works) */
@@ -200,7 +202,7 @@ int ompi_coll_base_alltoall_intra_bruck(const void *sbuf, int scount,
200202
int i, k, line = -1, rank, size, err = 0;
201203
int sendto, recvfrom, distance, *displs = NULL, *blen = NULL;
202204
char *tmpbuf = NULL, *tmpbuf_free = NULL;
203-
ptrdiff_t rlb, slb, tlb, sext, rext, tsext;
205+
OPAL_PTRDIFF_TYPE sext, rext, span, gap;
204206
struct ompi_datatype_t *new_ddt;
205207

206208
if (MPI_IN_PLACE == sbuf) {
@@ -214,25 +216,23 @@ int ompi_coll_base_alltoall_intra_bruck(const void *sbuf, int scount,
214216
OPAL_OUTPUT((ompi_coll_base_framework.framework_output,
215217
"coll:base:alltoall_intra_bruck rank %d", rank));
216218

217-
err = ompi_datatype_get_extent (sdtype, &slb, &sext);
218-
if (err != MPI_SUCCESS) { line = __LINE__; goto err_hndl; }
219-
220-
err = ompi_datatype_get_true_extent(sdtype, &tlb, &tsext);
219+
err = ompi_datatype_type_extent (sdtype, &sext);
221220
if (err != MPI_SUCCESS) { line = __LINE__; goto err_hndl; }
222221

223-
err = ompi_datatype_get_extent (rdtype, &rlb, &rext);
222+
err = ompi_datatype_type_extent (rdtype, &rext);
224223
if (err != MPI_SUCCESS) { line = __LINE__; goto err_hndl; }
225224

225+
span = opal_datatype_span(&sdtype->super, size * scount, &gap);
226226

227227
displs = (int *) malloc(size * sizeof(int));
228228
if (displs == NULL) { line = __LINE__; err = -1; goto err_hndl; }
229229
blen = (int *) malloc(size * sizeof(int));
230230
if (blen == NULL) { line = __LINE__; err = -1; goto err_hndl; }
231231

232232
/* tmp buffer allocation for message data */
233-
tmpbuf_free = (char *) malloc(tsext + ((ptrdiff_t)scount * (ptrdiff_t)size - 1) * sext);
233+
tmpbuf_free = (char *)malloc(span);
234234
if (tmpbuf_free == NULL) { line = __LINE__; err = -1; goto err_hndl; }
235-
tmpbuf = tmpbuf_free - slb;
235+
tmpbuf = tmpbuf_free - gap;
236236

237237
/* Step 1 - local rotation - shift up by rank */
238238
err = ompi_datatype_copy_content_same_ddt (sdtype,

ompi/mca/coll/base/coll_base_alltoallv.c

Lines changed: 12 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
* Copyright (c) 2013 Los Alamos National Security, LLC. All Rights
1515
* reserved.
1616
* Copyright (c) 2013 FUJITSU LIMITED. All rights reserved.
17-
* Copyright (c) 2014-2015 Research Organization for Information Science
17+
* Copyright (c) 2014-2016 Research Organization for Information Science
1818
* and Technology (RIST). All rights reserved.
1919
* $COPYRIGHT$
2020
*
@@ -38,16 +38,16 @@
3838

3939
int
4040
mca_coll_base_alltoallv_intra_basic_inplace(const void *rbuf, const int *rcounts, const int *rdisps,
41-
struct ompi_datatype_t *rdtype,
42-
struct ompi_communicator_t *comm,
43-
mca_coll_base_module_t *module)
41+
struct ompi_datatype_t *rdtype,
42+
struct ompi_communicator_t *comm,
43+
mca_coll_base_module_t *module)
4444
{
4545
mca_coll_base_module_t *base_module = (mca_coll_base_module_t*) module;
4646
int i, j, size, rank, err=MPI_SUCCESS;
4747
ompi_request_t **preq, **reqs;
48-
char *tmp_buffer;
48+
char *allocated_buffer, *tmp_buffer;
4949
size_t max_size, rdtype_size;
50-
ptrdiff_t ext;
50+
OPAL_PTRDIFF_TYPE ext, gap;
5151

5252
/* Initialize. */
5353

@@ -63,16 +63,17 @@ mca_coll_base_alltoallv_intra_basic_inplace(const void *rbuf, const int *rcounts
6363
/* Find the largest receive amount */
6464
ompi_datatype_type_extent (rdtype, &ext);
6565
for (i = 0, max_size = 0 ; i < size ; ++i) {
66-
size_t size = ext * rcounts[i];
67-
66+
size_t size = opal_datatype_span(&rdtype->super, rcounts[i], &gap);
6867
max_size = size > max_size ? size : max_size;
6968
}
69+
/* The gap will always be the same as we are working on the same datatype */
7070

7171
/* Allocate a temporary buffer */
72-
tmp_buffer = calloc (max_size, 1);
73-
if (NULL == tmp_buffer) {
72+
allocated_buffer = calloc (max_size, 1);
73+
if (NULL == allocated_buffer) {
7474
return OMPI_ERR_OUT_OF_RESOURCE;
7575
}
76+
tmp_buffer = allocated_buffer - gap;
7677

7778
/* Initiate all send/recv to/from others. */
7879
reqs = preq = coll_base_comm_get_reqs(base_module->base_data, 2);
@@ -125,7 +126,7 @@ mca_coll_base_alltoallv_intra_basic_inplace(const void *rbuf, const int *rcounts
125126

126127
error_hndl:
127128
/* Free the temporary buffer */
128-
free (tmp_buffer);
129+
free (allocated_buffer);
129130
if( MPI_SUCCESS != err ) {
130131
ompi_coll_base_free_reqs(reqs, 2 );
131132
}

0 commit comments

Comments
 (0)