Skip to content

WeeklyTelcon_20160119

Geoff Paulsen edited this page Jan 19, 2016 · 5 revisions

Open MPI Weekly Telcon


  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees

  • Geoff Paulsen
  • Jeff Squyres
  • Brad Benton
  • Edgar Gabriel
  • Howard
  • Joshua Hursey
  • Joshua Ladd
  • Nathan Hjelm
  • Ralph
  • Sylvain Jeaugey
  • Todd Kordenbrock

Agenda

Review 1.10

  • Milestones: https://github.com/open-mpi/ompi-release/milestones/v1.10.2
  • Need to verify that library versions are still correct.
  • Cisco Weekend MTT tests didn't look good.
    • Build failure also.
    • usNIC unable to connect. Maybe a cluster issue.
    • Autogen --force didn't bring to 1.10, should remove from Cisco MTT.
    • Ralph will try to replicate MPI_Abort. Abort test itself.
    • 1.10 C Strided mutex lock issue. Nathan not surprised if it might be a bug. 1 fail. specific build config.
      • enable memchecker build could be affecting timing. Nathan will take a look... should be simple.
    • Jeff will look at MTT things after call.
    • High CPU utilization on Async progress thread. Ralph will take a look. From -GE.
  • After all of these issues are resolved / addressed can ship 1.10.2

Review 2.0.x

Review Master?

  • Edgar's PR into master PR (Try to work around Luster, by switching over to use ROMIO).
    • Not sure if issues he's seeing on Cray or on his cluster. Could be related, but need to get cluster running again.
    • Wanted to see if any warnings from jenkins.
    • But running that portion of code on Edgar's cluster, hits many issues.
    • BTL flags = 305 perf got horrible (used to get better).
    • did something else change in configure ? Hitting one issue after another independant of OMPIO.
    • OMPIO is not finding PFS2 correctly during configure. Jeff can use screen share with Edgar.
    • Issues only show up with 96 procs to hit, which makes debugging more difficult.

MTT status:

Status Updates:

  • LANL
  • Houston
  • HLRS
  • IBM

Status Update Rotation

  1. LANL, Houston, HLRS, IBM
  2. Cisco, ORNL, UTK, NVIDIA
  3. Mellanox, Sandia, Intel

Back to 2016 WeeklyTelcon-2016

Clone this wiki locally