Skip to content

Conversation

@ax3l
Copy link

@ax3l ax3l commented Feb 23, 2019

Same as #6344, but applied to v3.0.x.

Contains #6286 #6287.

cc @edgargabriel @jsquyres

This commit fixes  a problem reported on the mailing list with
individual writes larger than 512 MB.

The culprit is a floating point division of two large, close values.
Changing the datatypes from float to double (which is what is being
used in the fcoll components) fixes the problem.

See issue open-mpi#6285 and

 https://forum.hdfgroup.org/t/cannot-write-more-than-512-mb-in-1d/5118

Thanks for Axel Huebl and René Widera for reporting the issue.

Signed-off-by: Edgar Gabriel <[email protected]>
(cherry picked from commit c0f8ce0)
Similar to open-mpi#6286 rounding number of bytes into a single precision floating point value to round up the result of a division is a potential risk due to rounding errors.

- remove floating point operations for `round up`
- removes floating point conversion for round down (native behavior of integer division)

Signed-off-by: René Widera <[email protected]>

Note: a direct cherry pick of commit a91fab8 is not possible, due to structural differences between the the 3.1.x and the master/v4.0.x branch. This commit is the equivalent of commit  a91fab8.

Signed-off-by: Edgar Gabriel <[email protected]>
@ompiteam-bot
Copy link

Can one of the admins verify this patch?

@ax3l ax3l mentioned this pull request Feb 23, 2019
2 tasks
@jsquyres
Copy link
Member

ok to test

@edgargabriel
Copy link
Member

I would like to emphasize that having

#6326

pr'ed to 3.0 and 3.1 is as important ( maybe even more important ) as this issue. That is however a datatype issue, and I am not sure whether it applies cleanly from master to the 3.x branches, or whether it requires some additional work.

@jsquyres
Copy link
Member

@edgargabriel Thanks for bringing that up. I tried to cherry pick 5a82c4f from #6326, but ran into conflicts on v3.0.x and v3.1.x. I'll file a separate issue and ping George.

@jsquyres jsquyres merged commit aa3abb8 into open-mpi:v3.0.x Feb 26, 2019
@ax3l ax3l deleted the bp-ompioCrashRounding branch February 26, 2019 17:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants