Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 17 additions & 16 deletions opal/mca/common/cuda/common_cuda.c
Original file line number Diff line number Diff line change
Expand Up @@ -1739,19 +1739,19 @@ static int mca_common_cuda_is_gpu_buffer(const void *pUserBuf, opal_convertor_t
int res;
CUmemorytype memType = 0;
CUdeviceptr dbuf = (CUdeviceptr)pUserBuf;
CUcontext ctx = NULL;
CUcontext ctx = NULL, memCtx = NULL;
#if OPAL_CUDA_GET_ATTRIBUTES
uint32_t isManaged = 0;
/* With CUDA 7.0, we can get multiple attributes with a single call */
CUpointer_attribute attributes[3] = {CU_POINTER_ATTRIBUTE_MEMORY_TYPE,
CU_POINTER_ATTRIBUTE_CONTEXT,
CU_POINTER_ATTRIBUTE_IS_MANAGED};
void *attrdata[] = {(void *)&memType, (void *)&ctx, (void *)&isManaged};
void *attrdata[] = {(void *)&memType, (void *)&memCtx, (void *)&isManaged};

res = cuFunc.cuPointerGetAttributes(3, attributes, attrdata, dbuf);
OPAL_OUTPUT_VERBOSE((101, mca_common_cuda_output,
"dbuf=%p, memType=%d, ctx=%p, isManaged=%d, res=%d",
(void *)dbuf, (int)memType, (void *)ctx, isManaged, res));
"dbuf=%p, memType=%d, memCtx=%p, isManaged=%d, res=%d",
(void *)dbuf, (int)memType, (void *)memCtx, isManaged, res));

/* Mark unified memory buffers with a flag. This will allow all unified
* memory to be forced through host buffers. Note that this memory can
Expand Down Expand Up @@ -1787,6 +1787,7 @@ static int mca_common_cuda_is_gpu_buffer(const void *pUserBuf, opal_convertor_t
}
/* Must be a device pointer */
assert(memType == CU_MEMORYTYPE_DEVICE);
#endif /* OPAL_CUDA_GET_ATTRIBUTES */

/* This piece of code was added in to handle in a case involving
* OMP threads. The user had initialized CUDA and then spawned
Expand All @@ -1797,25 +1798,25 @@ static int mca_common_cuda_is_gpu_buffer(const void *pUserBuf, opal_convertor_t
* and set the current context to that. It is rare that we will not
* have a context. */
res = cuFunc.cuCtxGetCurrent(&ctx);
#endif /* OPAL_CUDA_GET_ATTRIBUTES */
if (OPAL_UNLIKELY(NULL == ctx)) {
if (CUDA_SUCCESS == res) {
res = cuFunc.cuPointerGetAttribute(&ctx,
#if !OPAL_CUDA_GET_ATTRIBUTES
res = cuFunc.cuPointerGetAttribute(&memCtx,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this overwrite the value of memCtx? If so, what's the point of filling it in lines 1749 and 1754? I ask because you specifically changed lines 1749 and 1754 to use the new variable memCtx...

More generally, after reading this patch a few times, I'm not sure what the difference is between ctx and memCtx -- it seems like they're used mutually exclusively, and therefore you don't really need 2 variables. I don't really care, mind you -- this is your code, and if you want 2 variables just to make the code more readable, that's cool. 😄 But I figured I'd ask: is there a reason to have 2 variables for the code logic? Or is it just to make the code more readable?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are two different code paths based on OPAL_CUDA_GET_ATTRIBUTES. With OPAL_CUDA_GET_ATTRIBUTES set, we fill in the memCtx variable with a single call that is also filling in two other pieces of information. Without OPAL_CUDA_GET_ATTRIBUTES, we have to make a specific call for memCtx. Also, I need the two variables because when I do have OPAL_CUDA_GET_ATTRIBUTES, I have previously saved the memCtx variable. I agree that when we do not have OPAL_CUDA_GET_ATTRIBUTES, we do not need memCtx. But, that is what this bug is fixing. Making things work with OPAL_CUDA_GET_ATTRIBUTES set.

So, the general idea is that we put the memory context into memCtx and the current context into ctx. Does that make sense?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool.

CU_POINTER_ATTRIBUTE_CONTEXT, dbuf);
if (res != CUDA_SUCCESS) {
if (OPAL_UNLIKELEY(res != CUDA_SUCCESS)) {
opal_output(0, "CUDA: error calling cuPointerGetAttribute: "
"res=%d, ptr=%p aborting...", res, pUserBuf);
return OPAL_ERROR;
}
#endif /* OPAL_CUDA_GET_ATTRIBUTES */
res = cuFunc.cuCtxSetCurrent(memCtx);
if (OPAL_UNLIKELY(res != CUDA_SUCCESS)) {
opal_output(0, "CUDA: error calling cuCtxSetCurrent: "
"res=%d, ptr=%p aborting...", res, pUserBuf);
return OPAL_ERROR;
} else {
res = cuFunc.cuCtxSetCurrent(ctx);
if (res != CUDA_SUCCESS) {
opal_output(0, "CUDA: error calling cuCtxSetCurrent: "
"res=%d, ptr=%p aborting...", res, pUserBuf);
return OPAL_ERROR;
} else {
opal_output_verbose(10, mca_common_cuda_output,
"CUDA: cuCtxSetCurrent passed: ptr=%p", pUserBuf);
}
OPAL_OUTPUT_VERBOSE((10, mca_common_cuda_output,
"CUDA: cuCtxSetCurrent passed: ptr=%p", pUserBuf));
}
} else {
/* Print error and proceed */
Expand Down