Skip to content

Conversation

@avilcheslopez
Copy link
Contributor

This is basically just guarding against negative indexes.

For opal_pointer_array_get_item(), the code documentation specifies that NULL should be returned on error, so I simply expanded the check in the initial "if" statement to check for a negative index.

For opal_pointer_array_set_item(), the code documentation specifies that (-1) should be returned on error, so I added an "if" statement to check for a negative index. I noticed the assert instruction to guard against a NULL array and was wondering if this should be used to guard against a negative index instead, but figured returning an error was more consistent with the code documentation and more consistent with opal_array_get_item()'s behavior.

If this is not the appropriate fix, I would be more than happy to make the necessary changes.

@rhc54
Copy link
Contributor

rhc54 commented Jul 11, 2015

@bosilca any thoughts on this? Given that we do use pointer arrays in the critical path (IIRC), I suppose we could use opal_unlikely here or change to an assert. This is motivated by the code checking going on inside Intel, and the checker is barking because of the potential error if a negative index is given.

@rhc54 rhc54 added this to the Future milestone Jul 14, 2015
@rhc54
Copy link
Contributor

rhc54 commented Jul 14, 2015

@hjelmn any thoughts on this?

@hjelmn
Copy link
Member

hjelmn commented Jul 14, 2015

Shouldn't we just make the index argument an unsigned int? This would cause a negative number to trigger the table->size <= element_index check.

@avilcheslopez
Copy link
Contributor Author

@hjelmn: we could make it an unsigned int, however, this would require quite a few more changes. The opal_pointer_array_t struct and some function signatures would need to be changed. For example, opal_pointer_array_add returns the index where an item was inserted (which would now be unsigned), but returns (-1) in case of error. So we would have to decide how to deal with this. Also, we would have to check where these functions are being used to make sure this change wouldn't introduce any issues.

@rhc54
Copy link
Contributor

rhc54 commented Jul 17, 2015

@avilcheslopez you make a good point about the return code. We would need some way of returning an error, and the negative index was an easy one to check. Only other option would be to alter the param list so the index was returned in a pointer there, and then the return code can be an OPAL_ERR_foo.

But that would again involve changing all the current usages, and that would be annoying. So I think your approach makes the most sense, though we may want to switch to an assert for performance (or at least an OPAL_UNLIKELY).

@rhc54
Copy link
Contributor

rhc54 commented Jul 17, 2015

@avilcheslopez taking another quick glance at the overall code, I think I'd suggest changing these to an assert. We should never see negative indices in practice, which means it was likely to be a programming error. So doing an assert that let's the developer stack trace the problem might be the best solution.

@avilcheslopez
Copy link
Contributor Author

@rhc54: makes sense. Unless anyone else has any input on this, I'll go ahead and change it to use asserts instead.

@bosilca
Copy link
Member

bosilca commented Jul 19, 2015

I would argue that using the OPAL_UNLIKELY is the most clean way to handle this as it will allow us to maintain most of the performance of the original code while still having error checks.

@avilcheslopez
Copy link
Contributor Author

@rhc54, @bosilca: I changed it to using OPAL_UNLIKELY to see how it would look like. Let me know what you think. In opal_pointer_array_set_item() I still left the NULL check in the assert, but let me know if OPAL_UNLIKELY should be used for that as well.

@lanl-ompi
Copy link
Contributor

Test FAILed.

1 similar comment
@lanl-ompi
Copy link
Contributor

Test FAILed.

@jsquyres
Copy link
Member

@avilcheslopez I don't know if you've clicked through the "Details" links on the CI block at the bottom of the PR, but they're showing build failures like this:

make[3]: Entering directory `/var/lib/jenkins/workspace/ompi_master_pr_distcheck/openmpi-gitclone/_build/sub/opal/tools/wrappers'
  CC       opal_wrapper.o
  SED      opal_wrapper.1
  CCLD     opal_wrapper
../../../opal/.libs/libopen-pal.so: undefined reference to `OPAL_UNLIKELY'

Perhaps you're missing a .h file...?

@avilcheslopez
Copy link
Contributor Author

@jsquyres: oops, sorry, will fix (just wanted to show how the code would look like with the suggestion). In the meantime, any input or suggestions? I was leaning more towards using asserts, but as @bosilca was pointing out, using OPAL_UNLIKELY would give better performance.

@hjelmn
Copy link
Member

hjelmn commented Jul 20, 2015

asserts are noops in optimized builds so they are probably fine here.

@jsquyres
Copy link
Member

I don't have a strong opinion here either way -- I agree that passing a negative value is likely a programmer error, and so therefore an assert() is likely fine. But defensive run-time programming is also a Good Thing -- we may not test a case that passes in a negative value, and therefore it may happen out in the real world. Hence, the OPAL_UNLIKELY will protect more cases.

With Intel chip branch prediction these days, I think the OPAL_UNLIKELY makes the penalty asymptotically approach zero.

Also, in the set_item() case, there's already several if statements in that code, so adding 1 more (surrounded by OPAL_UNLIKELY) is unlikely (pun intended!) to add a noticeable performance penalty.

In the get_item() case, we were already returning NULL for one error; I think the addition of returning it for another error (also protected by OPAL_UNLIKELY) is good.

@avilcheslopez Can you please squash these two commits down to one?

In short: please squash into 1 commit, and then I think it's fine to merge to master.

@jsquyres jsquyres modified the milestones: Open MPI v2.0.0, Future Jul 22, 2015
@avilcheslopez
Copy link
Contributor Author

@jsquyres: awesome, will do! Thanks!

@avilcheslopez
Copy link
Contributor Author

@jsquyres: done!

jsquyres added a commit that referenced this pull request Jul 23, 2015
Improving opal_pointer_array bounds checking.
@jsquyres jsquyres merged commit df80028 into open-mpi:master Jul 23, 2015
jsquyres pushed a commit to jsquyres/ompi that referenced this pull request Nov 10, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants