Skip to content

Conversation

@nrgraham23
Copy link
Contributor

Fixed an error where if there were no MPI exceptions, a
JNI error could still exist and not get handled.

Signed-off-by: Nathaniel Graham [email protected]

Fixes #1698

@hppritcha @jsquyres

Fixed an error where if there were no MPI exceptions, a
JNI error could still exist and not get handled.

Signed-off-by: Nathaniel Graham <[email protected]>
@hppritcha hppritcha added this to the v2.0.1 milestone Jun 21, 2016
@nrgraham23 nrgraham23 merged commit 679a66c into open-mpi:master Jun 22, 2016
@hppritcha
Copy link
Member

@nrgraham23 could you open a PR on ompi-release for this commit? milestone 2.0.1. Thanks.

return JNI_TRUE;
}
else if (JNI_TRUE == jni_exception) {
return JNI_TRUE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nrgraham23 the result of this function is not checked in collectives. shouldn't we "cancel" the native exception and throw a MPIException here ?
my knowledge of C/Java interaction is pretty limited, so the terminology I used might be incorrect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ggouaillardet @hppritcha @jsquyres @siegmargross

I would say no. JNI exceptions are directly related to problems with something the JNI has tried to do. It seems to me like this is a separate issue from MPI because it is a result of a user trying to do something they shouldn't in Java. In the case of the index out of bounds error that has brought this up, a java array is too short for the attempted call.

That said, I have thoroughly traced the way the exceptions are being thrown, and discovered that it is actually acting differently from the way I had originally thought. The ompi_java_exceptionCheck method is returning a true when a JNI error occurs, however it's return is almost never checked (you can do a "git grep ompi_java_exceptionCheck" from the top ompi directory to see what I am talking about).

Instead, in almost every case, the code continues to execute in the C code, and the Java exception is not actually thrown until it returns to Java land.

I propose we change it to either throw an exception in ompi_java_exceptionCheck back to Java so the code stops executing and change the method return to void, or change all of the places where ompi_java_exceptionCheck is being called to actually check the return value.

The first option is far less code, but the second option would allow us to do memory cleanup stuff. Any thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently, in such a case, the cleanup is performed (e.g. ompi_java_releaseWritePtr or ompi_java_releaseReadPtr) but a java.lang.*Exception is thrown instead of a mpi.MPIException, am i right ?

what if there is an MPI error (for example, root rank is negative ) ? in my understanding, no cleanup is performed.

at first, could/should we

  • do the cleanup first
  • and then ompi_java_exceptionCheck ?

an other way to put things is, is there any reason not to do any cleanup if the MPI subroutine returned an error ?

if the buffer is too short, throwing a mpi.MPIException is still half baked to me, since blocking MPI subroutines would throw a mpi.MPI_Exception but non blocking MPI subroutines would throw a java.lang.*Exception and likely after the MPI subroutine returned successfully.

bottom line, i do not feel comfortable with that kind of approach.

  • having Java throw a java.lang.*Exception is a valid option to me
  • having MPI Java bindings check user buffer (a la memchecker) when directed by the user (this is not cheap from a performance point of view) can be seen as an interesting but new feature

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I have seen so far as I have started changing some of the error code is that cleanup is happening in the case of any exception (MPI included), but objects are being returned to Java that should not be. The JNI stuff gets cleaned up, but its possible there is some MPI related stuff that is not getting cleaned up. Ill have to look more closely at that, but a first pass probably won't include those changes.

I think the non blocking MPI subroutines get their errors through the Request object that is created, but I could be wrong about that. Its been a while since I have looked at that code.

I personally don't think we should be checking user buffers, but I suppose if someone wanted to add it as a feature they could. So long as there was a switch to turn it on or off. And that is probably something that would need to be discussed with more people.

@nrgraham23 nrgraham23 deleted the jni_error_handling branch November 16, 2016 01:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants