Skip to content

Conversation

@RandomDSdevel
Copy link
Contributor

@RandomDSdevel RandomDSdevel commented May 7, 2018

  • Have you followed the guidelines for contributing?
  • Have you checked that there aren't other open pull requests for the same formula update/change?
  • Have you built your formula locally with brew install --build-from-source <formula>, where <formula> is the name of the formula you're submitting?
  • Does your build pass brew audit --strict <formula> (after doing brew install <formula>)?

     Follows up on #26009.

Outstanding Questions:

  • See PR commit comments for most of them.
  • Do these changes warrant any bottle rebuilds or formula revisions to open-mpi.rb or any of the formulas that depend on it? (Unlikely, I suspect, since this PR fixes a non-default option's build semantics, but I just thought I'd double-check and make sure…)
  • What should I do here about regenerating Open MPI's makefiles after having this PR's changes make brew apply the patches it pulls in to them? (If one doesn't account for this, building --with-java currently still fails even though it should work with said patches.) I'll look around and see if any other formula currently has to deal with any situation of this sort, but maybe somebody could let me know here before I get too lost digging…? (Will likely involve copying what head-spec builds do in some way, but I'm not quite sure what the best way to do that would be at the moment…) (Already resolved.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

     Do patch do blocks outside of any SoftwareSpec block apply the patches they pull in to the stable spec? I couldn't remember if that was the case off of the top of my head…

Copy link
Contributor Author

@RandomDSdevel RandomDSdevel May 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

     Do you think this comment is too long? How would you prefer it be worded?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

     Similarly, is this comment too long, and how would you prefer it be worded?

@RandomDSdevel
Copy link
Contributor Author

Outstanding Questions:

  • What should I do here about regenerating Open MPI's makefiles after having this PR's changes make brew apply the patches it pulls in to them? (If one doesn't account for this, building --with-java currently still fails even though it should work with said patches.) I'll look around and see if any other formula currently has to deal with any situation of this sort, but maybe somebody could let me know here before I get too lost digging…? (Will likely involve copying what head-spec builds do in some way, but I'm not quite sure what the best way to do that would be at the moment…)

     Hmm, it doesn't look from scanning Homebrew Core formulas like any of them with both patch do blocks and the resulting :build dependencies repeat the latter between both stable and head, but I also:

  • don't immediately recall whether head dependencies might collide with stable ones instead of overriding them on first glance

and

  • wouldn't want anybody to forget to put those :build dependencies back inside the head SoftwareSpec block if I removed them from there,

so I'll just comment them out for now and leave a note about it. Is that all right?

@RandomDSdevel
Copy link
Contributor Author

RandomDSdevel commented May 7, 2018

     Hopefully that unbreaks the CI build. I'm also testing the --with-java build locally in parallel. Oh, crud: I forgot to edit the install do block to regenerate the make files; be right back…

@RandomDSdevel
Copy link
Contributor Author

     OK, that should be everything. I'm still testing building --with-java locally in parallel, of course.

@RandomDSdevel
Copy link
Contributor Author

     Or not; apparently, running autogen.pl inside a distribution (non-developer) Open MPI source tree raise's the script's eyebrows, so to speak. Third time's the charm…

@RandomDSdevel
Copy link
Contributor Author

     I'll address the brew audit failures once my local test build finishes. It looks like the other part of the build failure is due to somebody forgetting to give OpenCoarrays a revision bump when GCC 8 was released, though…

@RandomDSdevel
Copy link
Contributor Author

     My local build attempt just finished successfully; I've updated my OP's checklist accordingly.

@RandomDSdevel
Copy link
Contributor Author

RandomDSdevel commented May 8, 2018

     I've fixed all the audit warnings I could and force-pushed–with–lease a new instance of this PR's only commit to my fork, but there are still a few left. Some will eventually be made moot by later Open MPI releases or might be style nits I could get advice on fixing; one wasn't introduced here.

@RandomDSdevel
Copy link
Contributor Author

     An issue with one or more of the commits pulled in as patches by this PR has been identified upstream here. Please stand by while I go back and help the maintainers resolve this.

@RandomDSdevel
Copy link
Contributor Author

     I've submitted upstream PR open-mpi/ompi#5160 to address the issue I relayed in my previous comment on this PR discussion thread. Once it's been approved and merged, I'll have this PR pull it in, too.

@RandomDSdevel
Copy link
Contributor Author

RandomDSdevel commented May 10, 2018

     OK, I think I've got everything sorted out now. Also testing locally…

@RandomDSdevel
Copy link
Contributor Author

     Reapplying my commit comments from earlier since they still apply…

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

     Do patch do blocks outside of any SoftwareSpec block apply the patches they pull in to the stable spec? I couldn't remember if that was the case off of the top of my head…

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

     Do you think this comment is too long? How would you prefer it be worded?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

     Similarly, is this comment too long, and how would you prefer it be worded?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

     And how about this one while I'm thinking about it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

     Additionally, I wonder if I could get rid of this hack and just do something with autoreconf for the stable spec instead, as that would likely be better than running ./autogen.pl --forcelooks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

     Never mind; Open MPI's HACKING guide says, as I might have seen before, the following on lines 206–210:

⋮

   5a. You generally need to run autogen.pl whenever the top-level
       file "configure.ac" changes, or any files in the config/ or
       <project>/config/ directories change (these directories are
       where a lot of "include" files for OMPI's configure script
       live).

⋮

So, alas, no direct autoreconf for me. (Shrugs.)

@RandomDSdevel
Copy link
Contributor Author

     My local build finished successfully, but CI is still experiencing the problem I described above (namely, that testing OpenCoarrays fails the build.) I don't currently have an OpenCoarrays installation on my machine, but I could make one and use that for a pull request to fix that issue if desired (like I speculated earlier, all that's likely needed to fix this would be a revision bump to opencoarrays.rb to account for GCC 8, though that's usually covered when the later sees a new version released and propagated to Homebrew — I wonder…goes digging to check and see if this was actually the case.)

@RandomDSdevel
Copy link
Contributor Author

RandomDSdevel commented May 10, 2018

     Looks like the revision-bump–blocking build error mentioned for OpenCoarrays starting at #27276 (comment) has some causal link with the test failure I'm seeing here. #26075 may also be related, though that PR has more to do with an OpenCoarrays build failure I'm not seeing here. Paging @fxcoudert, as he handled Homebrew's release of GCC 8.1, and @zbeekman, as he was(/is?) involved in both of the conversations I just referenced above. (I'll also look at the upstream issue linked to in both of those Homebrew issue-processing discussion threads.)

@RandomDSdevel
Copy link
Contributor Author

     To be more specific, here's what CI's console logs say (copied from those of the OS X v10.11.x 'El Capitan' build-bot configuration, as it uses the same OS X/macOS version I have installed on my machine, but the failure is comparable, modulo thread ordering, across all tested OS versions:)

⋮
15:07:02 ==> brew test --verbose opencoarrays
15:07:05 ==> FAILED
15:07:05 Testing opencoarrays
15:07:05 /usr/bin/sandbox-exec -f /tmp/homebrew20180510-91334-gs4dd2.sb /usr/local/Homebrew/Library/Homebrew/vendor/portable-ruby/2.3.3_2/bin/ruby -W0 -I /usr/local/Homebrew/Library/Homebrew -- /usr/local/Homebrew/Library/Homebrew/test.rb /usr/local/Homebrew/Library/Taps/homebrew/homebrew-core/Formula/opencoarrays.rb --verbose
15:07:05 ==> /usr/local/Cellar/opencoarrays/2.0.0/bin/caf tally.f90 -o tally
15:07:05 ==> /usr/local/Cellar/opencoarrays/2.0.0/bin/cafrun -np 3 ./tally
15:07:05 Fortran runtime error on image 3: Unsupported data type in collective: 0
15:07:05 
15:07:05 Fortran runtime error on image 1: Unsupported data type in collective: 0
15:07:05 
15:07:05 Fortran runtime error on image 2: Unsupported data type in collective: 0
15:07:05 
15:07:05 -------------------------------------------------------
15:07:05 Primary job  terminated normally, but 1 process returned
15:07:05 a non-zero exit code. Per user-direction, the job has been aborted.
15:07:05 -------------------------------------------------------
15:07:05 --------------------------------------------------------------------------
15:07:05 mpiexec detected that one or more processes exited with non-zero status, thus causing
15:07:05 the job to be terminated. The first process to do so was:
15:07:05 
15:07:05   Process name: [[17181,1],1]
15:07:05   Exit code:    1
15:07:05 --------------------------------------------------------------------------
15:07:05 Error: Command:
15:07:05    `/usr/local/bin/mpiexec -n 3 ./tally`
15:07:05 failed to run.
15:07:05 Error: opencoarrays: failed
15:07:05 Failed executing: /usr/local/Cellar/opencoarrays/2.0.0/bin/cafrun -np 3 ./tally
15:07:05 /usr/local/Homebrew/Library/Homebrew/formula.rb:1820:in `block in system'
15:07:05 /usr/local/Homebrew/Library/Homebrew/formula.rb:1758:in `open'
15:07:05 /usr/local/Homebrew/Library/Homebrew/formula.rb:1758:in `system'
15:07:05 /usr/local/Homebrew/Library/Taps/homebrew/homebrew-core/Formula/opencoarrays.rb:49:in `block in <class:Opencoarrays>'
15:07:05 /usr/local/Homebrew/Library/Homebrew/formula.rb:1664:in `block (3 levels) in run_test'
15:07:05 /usr/local/Homebrew/Library/Homebrew/utils.rb:556:in `with_env'
15:07:05 /usr/local/Homebrew/Library/Homebrew/formula.rb:1663:in `block (2 levels) in run_test'
15:07:05 /usr/local/Homebrew/Library/Homebrew/formula.rb:838:in `with_logging'
15:07:05 /usr/local/Homebrew/Library/Homebrew/formula.rb:1662:in `block in run_test'
15:07:05 /usr/local/Homebrew/Library/Homebrew/extend/fileutils.rb:13:in `block in mktemp'
15:07:05 /usr/local/Homebrew/Library/Homebrew/extend/fileutils.rb:73:in `block in run'
15:07:05 /usr/local/Homebrew/Library/Homebrew/extend/fileutils.rb:73:in `chdir'
15:07:05 /usr/local/Homebrew/Library/Homebrew/extend/fileutils.rb:73:in `run'
15:07:05 /usr/local/Homebrew/Library/Homebrew/extend/fileutils.rb:12:in `mktemp'
15:07:05 /usr/local/Homebrew/Library/Homebrew/formula.rb:1656:in `run_test'
15:07:05 /usr/local/Homebrew/Library/Homebrew/test.rb:28:in `block in <main>'
15:07:05 /usr/local/Homebrew/Library/Homebrew/vendor/portable-ruby/2.3.3_2/lib/ruby/2.3.0/timeout.rb:91:in `block in timeout'
15:07:05 /usr/local/Homebrew/Library/Homebrew/vendor/portable-ruby/2.3.3_2/lib/ruby/2.3.0/timeout.rb:33:in `block in catch'
15:07:05 /usr/local/Homebrew/Library/Homebrew/vendor/portable-ruby/2.3.3_2/lib/ruby/2.3.0/timeout.rb:33:in `catch'
15:07:05 /usr/local/Homebrew/Library/Homebrew/vendor/portable-ruby/2.3.3_2/lib/ruby/2.3.0/timeout.rb:33:in `catch'
15:07:05 /usr/local/Homebrew/Library/Homebrew/vendor/portable-ruby/2.3.3_2/lib/ruby/2.3.0/timeout.rb:106:in `timeout'
15:07:05 /usr/local/Homebrew/Library/Homebrew/test.rb:27:in `<main>'
⋮
15:08:19 Build step 'Execute shell' marked build as failure
15:08:19 Recording test results
15:08:19 Archiving artifacts
15:08:19 [WS-CLEANUP] Deleting project workspace...[WS-CLEANUP] done
15:08:19 Finished: FAILURE

@zbeekman
Copy link
Contributor

Unfortunately, OpenCoarrays will remain broken until the 8.2 release as a patch was applied by one of the GFortran developers during the regression freeze (which GFortran is exempted from since it's not considered a critical component) that completely broke some functionality of GFortran with -fcoarray=lib and OpenCoarrays. I am considering just temporarily removing the offending code that triggers the ICE (when building against 8.1) for the next release, which would not fix the problem but would merely hide it until 8.2. (The patch fixing it is in trunk and the 8 branch now...) Simply hiding the problem is probably a bad idea because it will give people false confidence and the breakage is fairly significant. The only realistic alternative is to patch GCC 8.1 for GFortran with the required patch. Otherwise OpenCoarrays will likely stay broken until 8.2.

@RandomDSdevel
Copy link
Contributor Author

@zbeekman:

Unfortunately, OpenCoarrays will remain broken until the 8.2 release as a patch was applied by one of the GFortran developers during the regression freeze (which GFortran is exempted from since it's not considered a critical component) that completely broke some functionality of GFortran with -fcoarray=lib and OpenCoarrays. …

     So that issue is indeed related to this one, then.

…I am considering just temporarily removing the offending code that triggers the ICE (when building against 8.1) for the next release, which would not fix the problem but would merely hide it until 8.2. […] Simply hiding the problem is probably a bad idea because it will give people false confidence and the breakage is fairly significant. …

…(The patch fixing it is in trunk and the 8 branch now...) […] The only realistic alternative is to patch GCC 8.1 for GFortran with the required patch. Otherwise OpenCoarrays will likely stay broken until 8.2.

     The relevant change doesn't look like it's been reviewed and approved just yet upstream, but yes, I expect this option would be preferred. We'll see what @fxcoudert thinks.

@RandomDSdevel
Copy link
Contributor Author

@fxcoudert:

     Since GCC v8.1 was merged in without anyone fixing its encounter with this same OpenCoarrays test failure, perhaps could we also just do for Open MPI that here? (GCC's a more widely used and important piece of software, though, so I expect that might have something to do with how its OpenCoarrays test failure was handled.)

@fxcoudert
Copy link
Member

What I'm wondering here is whether we really want to apply 2 patches and rework the formula, for a nondefault option.

@RandomDSdevel
Copy link
Contributor Author

RandomDSdevel commented May 16, 2018

@fxcoudert:

What I'm wondering here is whether we really want to apply 2 patches and rework the formula, for a nondefault option.

     To make sure that nobody else bumps into the issues those patches (there are actually three of them now, if you hadn't noticed) solve if they also use said non-default formula option. When an Open MPI release that includes all of those changes by default comes out, the formula can go back to being structured how it was before.

Apply some upstream patches in order to fix some fatal issues with configuring
and building Open MPI's `:optional` Java bindings from source.

Follows up on the first of the issues referenced below.

Refs:
- Homebrew#26009
- open-mpi/ompi#5000
@RandomDSdevel
Copy link
Contributor Author

     Rebased against the latest Open MPI formula changes (and upstream stable release available in Homebrew, naturally.) Note that this and a difference in how my upstream PRs were merged between the v3.0.x and v3.1.x release branches allowed me to simplify this PR somewhat. I unfortunately haven't had the chance to test these changes on my local machine yet, but rest assured that I'll correct that in due course.

url "https://www.open-mpi.org/software/ompi/v3.1/downloads/openmpi-3.1.0.tar.bz2"
sha256 "b25c044124cc859c0b4e6e825574f9439a51683af1950f6acda1951f5ccdf06c"

stable do
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

     Do patch do blocks outside of any SoftwareSpec block apply the patches they pull in to the stable spec? I couldn't remember if that was the case off of the top of my head…

#
# From the result of https://github.com/open-mpi/ompi/pull/5015 and
# https://github.com/open-mpi/ompi/pull/5160 as backported to Open MPI v3.1.x by
# https://github.com/open-mpi/ompi/pull/5118:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

     Do you think this comment is too long? How would you prefer it be worded?

@RandomDSdevel
Copy link
Contributor Author

     Reapplied those commit comments that still apply.

@RandomDSdevel
Copy link
Contributor Author

     Updated my OP's list of outstanding questions.

@RandomDSdevel
Copy link
Contributor Author

RandomDSdevel commented May 17, 2018

     I've now tested building Open MPI with the current iteration of this PR on my local machine, and it works exactly as expected. Bear in mind, however, that, again, my local testing doesn't cover the OpenCoarrays issue picked up by CI, as I don't use that dependent package myself.

@fxcoudert
Copy link
Member

Hi @RandomDSdevel, thanks for the patches and pull request. This is a lot of patching and extra dependencies (which we will have to remove next release) for something that affects so very few users: exactly one in the last 30 days:

install events in the last 30 days
============================================================================================================
 1 | open-mpi                                                                              | 6,351 |  93.27%
 2 | open-mpi --with-mpi-thread-multiple                                                   |   139 |   2.04%
 3 | open-mpi --with-cxx-bindings                                                          |   130 |   1.91%
 4 | open-mpi --with-mpi-thread-multiple --with-java                                       |    65 |   0.95%
 5 | open-mpi --without-fortran                                                            |    37 |   0.54%
 6 | open-mpi --with-mpi-thread-multiple --with-cxx-bindings                               |    31 |   0.46%
 7 | open-mpi --with-mpi-thread-multiple --with-cxx-bindings --with-java                   |    28 |   0.41%
 8 | open-mpi --with-mpi-thread-multiple --without-fortran                                 |    10 |   0.15%
 9 | open-mpi --with-java                                                                  |     9 |   0.13%
10 | open-mpi --with-mpi-thread-multiple --with-cxx-bindings --without-fortran             |     3 |   0.04%
11 | open-mpi --with-mpi-thread-multiple --without-fortran --with-java                     |     3 |   0.04%
12 | open-mpi --with-mpi-thread-multiple --with-cxx-bindings --without-fortran --with-java |     2 |   0.03%
13 | open-mpi --HEAD                                                                       |     1 |   0.01%
============================================================================================================
Total                                                                                      | 6,809 |    100%
============================================================================================================

install_on_request events in the last 30 days
============================================================================================================
 1 | open-mpi                                                                              | 3,781 |  90.98%
 2 | open-mpi --with-cxx-bindings                                                          |   115 |   2.77%
 3 | open-mpi --with-mpi-thread-multiple                                                   |   102 |   2.45%
 4 | open-mpi --with-mpi-thread-multiple --with-java                                       |    56 |   1.35%
 5 | open-mpi --with-mpi-thread-multiple --with-cxx-bindings                               |    31 |   0.75%
 6 | open-mpi --without-fortran                                                            |    26 |   0.63%
 7 | open-mpi --with-mpi-thread-multiple --with-cxx-bindings --with-java                   |    23 |   0.55%
 8 | open-mpi --with-java                                                                  |     9 |   0.22%
 9 | open-mpi --with-mpi-thread-multiple --without-fortran                                 |     9 |   0.22%
10 | open-mpi --with-mpi-thread-multiple --with-cxx-bindings --without-fortran             |     3 |   0.07%
11 | open-mpi --HEAD                                                                       |     1 |   0.02%
============================================================================================================
Total                                                                                      | 4,156 |    100%
============================================================================================================

BuildError events in the last 30 days
============================================================================================================
1 | open-mpi                                                                                  | 63 |  66.32%
2 | open-mpi --with-cxx-bindings                                                              | 14 |  14.74%
3 | open-mpi --with-mpi-thread-multiple                                                       |  9 |   9.47%
4 | open-mpi --with-mpi-thread-multiple --with-cxx-bindings                                   |  3 |   3.16%
5 | open-mpi --with-mpi-thread-multiple --with-java                                           |  3 |   3.16%
6 | open-mpi --with-mpi-thread-multiple --with-cxx-bindings --with-java                       |  2 |   2.11%
7 | open-mpi --with-java                                                                      |  1 |   1.05%
============================================================================================================
Total                                                                                         | 95 |    100%
============================================================================================================

I am going to decline the pull request and leave it as is, and wait for 3.1.1 to fix it. But many thanks again for your work on the issue!

@fxcoudert fxcoudert closed this May 24, 2018
@RandomDSdevel
Copy link
Contributor Author

@fxcoudert:

     Fair enough, and no problem. Looking at Open MPI's 'v3.1.1' milestone, it appears that release is imminent (the projected landing date is June 2nd, to be precise.)

@lock lock bot added the outdated PR was locked due to age label Jun 23, 2018
@lock lock bot locked as resolved and limited conversation to collaborators Jun 23, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

outdated PR was locked due to age

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants