Improving opal_pointer_array bounds checking. #709

avilcheslopez · 2015-07-10T20:34:40Z

This is basically just guarding against negative indexes.

For opal_pointer_array_get_item(), the code documentation specifies that NULL should be returned on error, so I simply expanded the check in the initial "if" statement to check for a negative index.

For opal_pointer_array_set_item(), the code documentation specifies that (-1) should be returned on error, so I added an "if" statement to check for a negative index. I noticed the assert instruction to guard against a NULL array and was wondering if this should be used to guard against a negative index instead, but figured returning an error was more consistent with the code documentation and more consistent with opal_array_get_item()'s behavior.

If this is not the appropriate fix, I would be more than happy to make the necessary changes.

rhc54 · 2015-07-11T15:01:40Z

@bosilca any thoughts on this? Given that we do use pointer arrays in the critical path (IIRC), I suppose we could use opal_unlikely here or change to an assert. This is motivated by the code checking going on inside Intel, and the checker is barking because of the potential error if a negative index is given.

rhc54 · 2015-07-14T22:39:01Z

@hjelmn any thoughts on this?

hjelmn · 2015-07-14T22:52:05Z

Shouldn't we just make the index argument an unsigned int? This would cause a negative number to trigger the table->size <= element_index check.

avilcheslopez · 2015-07-15T18:33:53Z

@hjelmn: we could make it an unsigned int, however, this would require quite a few more changes. The opal_pointer_array_t struct and some function signatures would need to be changed. For example, opal_pointer_array_add returns the index where an item was inserted (which would now be unsigned), but returns (-1) in case of error. So we would have to decide how to deal with this. Also, we would have to check where these functions are being used to make sure this change wouldn't introduce any issues.

rhc54 · 2015-07-17T17:23:46Z

@avilcheslopez you make a good point about the return code. We would need some way of returning an error, and the negative index was an easy one to check. Only other option would be to alter the param list so the index was returned in a pointer there, and then the return code can be an OPAL_ERR_foo.

But that would again involve changing all the current usages, and that would be annoying. So I think your approach makes the most sense, though we may want to switch to an assert for performance (or at least an OPAL_UNLIKELY).

rhc54 · 2015-07-17T17:26:17Z

@avilcheslopez taking another quick glance at the overall code, I think I'd suggest changing these to an assert. We should never see negative indices in practice, which means it was likely to be a programming error. So doing an assert that let's the developer stack trace the problem might be the best solution.

avilcheslopez · 2015-07-17T17:33:12Z

@rhc54: makes sense. Unless anyone else has any input on this, I'll go ahead and change it to use asserts instead.

bosilca · 2015-07-19T14:47:36Z

I would argue that using the OPAL_UNLIKELY is the most clean way to handle this as it will allow us to maintain most of the performance of the original code while still having error checks.

avilcheslopez · 2015-07-20T18:56:45Z

@rhc54, @bosilca: I changed it to using OPAL_UNLIKELY to see how it would look like. Let me know what you think. In opal_pointer_array_set_item() I still left the NULL check in the assert, but let me know if OPAL_UNLIKELY should be used for that as well.

lanl-ompi · 2015-07-20T19:04:30Z

Test FAILed.

lanl-ompi · 2015-07-20T19:23:18Z

Test FAILed.

jsquyres · 2015-07-20T19:43:27Z

@avilcheslopez I don't know if you've clicked through the "Details" links on the CI block at the bottom of the PR, but they're showing build failures like this:

make[3]: Entering directory `/var/lib/jenkins/workspace/ompi_master_pr_distcheck/openmpi-gitclone/_build/sub/opal/tools/wrappers'
  CC       opal_wrapper.o
  SED      opal_wrapper.1
  CCLD     opal_wrapper
../../../opal/.libs/libopen-pal.so: undefined reference to `OPAL_UNLIKELY'

Perhaps you're missing a .h file...?

avilcheslopez · 2015-07-20T20:15:29Z

@jsquyres: oops, sorry, will fix (just wanted to show how the code would look like with the suggestion). In the meantime, any input or suggestions? I was leaning more towards using asserts, but as @bosilca was pointing out, using OPAL_UNLIKELY would give better performance.

hjelmn · 2015-07-20T20:56:08Z

asserts are noops in optimized builds so they are probably fine here.

jsquyres · 2015-07-22T16:19:30Z

I don't have a strong opinion here either way -- I agree that passing a negative value is likely a programmer error, and so therefore an assert() is likely fine. But defensive run-time programming is also a Good Thing -- we may not test a case that passes in a negative value, and therefore it may happen out in the real world. Hence, the OPAL_UNLIKELY will protect more cases.

With Intel chip branch prediction these days, I think the OPAL_UNLIKELY makes the penalty asymptotically approach zero.

Also, in the set_item() case, there's already several if statements in that code, so adding 1 more (surrounded by OPAL_UNLIKELY) is unlikely (pun intended!) to add a noticeable performance penalty.

In the get_item() case, we were already returning NULL for one error; I think the addition of returning it for another error (also protected by OPAL_UNLIKELY) is good.

@avilcheslopez Can you please squash these two commits down to one?

In short: please squash into 1 commit, and then I think it's fine to merge to master.

avilcheslopez · 2015-07-23T00:58:50Z

@jsquyres: awesome, will do! Thanks!

avilcheslopez · 2015-07-23T17:10:13Z

@jsquyres: done!

Improving opal_pointer_array bounds checking.

OPAL_UNLIKELY).

udcm: fix bugs

rhc54 added the enhancement label Jul 14, 2015

rhc54 added this to the Future milestone Jul 14, 2015

rhc54 added the code cleanup label Jul 14, 2015

avilcheslopez force-pushed the master branch from 2126319 to ecdc69c Compare July 20, 2015 20:12

jsquyres modified the milestones: Open MPI v2.0.0, Future Jul 22, 2015

avilcheslopez force-pushed the master branch from ecdc69c to 994ed60 Compare July 23, 2015 17:08

jsquyres added a commit that referenced this pull request Jul 23, 2015

Merge pull request #709 from avilcheslopez/master

df80028

Improving opal_pointer_array bounds checking.

jsquyres merged commit df80028 into open-mpi:master Jul 23, 2015

Improving opal_pointer_array bounds checking (using

994ed60

OPAL_UNLIKELY).

jsquyres pushed a commit to jsquyres/ompi that referenced this pull request Nov 10, 2015

Merge pull request open-mpi#709 from hjelmn/v2.x_udcm_fixes

ac58b53

udcm: fix bugs

Improving opal_pointer_array bounds checking. #709

Improving opal_pointer_array bounds checking. #709

Uh oh!

Conversation

avilcheslopez commented Jul 10, 2015

Uh oh!

rhc54 commented Jul 11, 2015

Uh oh!

rhc54 commented Jul 14, 2015

Uh oh!

hjelmn commented Jul 14, 2015

Uh oh!

avilcheslopez commented Jul 15, 2015

Uh oh!

rhc54 commented Jul 17, 2015

Uh oh!

rhc54 commented Jul 17, 2015

Uh oh!

avilcheslopez commented Jul 17, 2015

Uh oh!

bosilca commented Jul 19, 2015

Uh oh!

avilcheslopez commented Jul 20, 2015

Uh oh!

lanl-ompi commented Jul 20, 2015

Uh oh!

lanl-ompi commented Jul 20, 2015

Uh oh!

jsquyres commented Jul 20, 2015

Uh oh!

avilcheslopez commented Jul 20, 2015

Uh oh!

hjelmn commented Jul 20, 2015

Uh oh!

jsquyres commented Jul 22, 2015

Uh oh!

avilcheslopez commented Jul 23, 2015

Uh oh!

avilcheslopez commented Jul 23, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants